A Multimodal Pre-trained Network for Integrated EEG-Video Seizure Detection

cs.CV updates on arXiv.org

Tong Lu, Ke Xu, Zimo Zhang, Zitong Zhao, Danwei Weng, Ruiyu Wang, Miao Liu, Zizuo Zhang, Jingyi Yao, Yixuan Zhao, Wenchao Zhang, Min Wang, Guoming Luan, Minmin Luo, Zhifeng Yue

Apr 30, 2026, 12:00 AM

arXiv:2604.26379v1 Announce Type: new Abstract: Reliable seizure detection in mouse models is essential for preclinical epilepsy research, yet manual review of synchronized video-EEG recordings is labor-intensive and single-modality systems fail for complementary reasons: video-based methods are easily confounded by benign behaviors, whereas EEG-based methods are vulnerable to ictal motion artifacts. We present EEGVFusion, a multimodal framework that combines self-supervised EEG representation learning, spatio-temporal video encoding, optimal-transport alignment, and bidirectional cross-attention to integrate neural and behavioral evidence. We also curate an expert-annotated dataset of synchronized EEG and video recordings comprising 93 sessions from 15 mice for training and evaluation. In the random-session split, EEGVFusion achieved a Balanced Accuracy of 0.9957 with perfect event sensitivity and an Event FAR of 0.6250 FP/h, indicating strong seizure detection performance with a low false-alarm burden. In a single held-out-subject evaluation with Subject 110 reserved for testing, EEGVFusion achieved a Balanced Accuracy of 0.9718 and reduced Event FAR from 2.7250 FP/h for the EEG-only counterpart to 0.4833 FP/h while preserving perfect event sensitivity. Targeted ablations further showed that EEG pre-training and OT alignment help reduce false alarms while preserving event sensitivity.