EmoS: A High-Fidelity Multimodal Benchmark for Fine-grained Streaming Emotional Understanding

cs.CL updates on arXiv.org

Pengze Guo, Jingxi Liang, Zhiwen Xie, Qifeng Wang, Derek F. Wong

May 12, 2026, 12:00 AM

arXiv:2605.08847v1 Announce Type: new Abstract: In the context of today's high-pressure, aging society, the demand for large-scale emotional models capable of providing empathetic support is more critical than ever. However, existing benchmarks fail to simultaneously achieve ecological validity, signal clarity, and reliable fine-grained labeling. We introduce EmoS, a high-fidelity bilingual benchmark designed to resolve the limitations of ecological validity and noise in existing datasets by combining strictly filtered static slices with a dynamic Streaming Monologue subset. Supported by a rigorous dual-layer human annotation pipeline, EmoS provides trusted ground truth that captures continuous emotional evolution. Empirical results show that fine-tuning MLLMs (multimodal large language models) on EmoS yields significant gains over zero-shot baselines, laying the foundation for the training and evaluation of future emotion recognition models and empathy models. The dataset and code are publicly available at https://github.com/NLP2CT/EmoS.