[2603.02123] Nano-EmoX: Unifying Multimodal Emotional Intelligence from Perception to Empathy
About this article
Abstract page for arXiv paper 2603.02123: Nano-EmoX: Unifying Multimodal Emotional Intelligence from Perception to Empathy
Computer Science > Artificial Intelligence arXiv:2603.02123 (cs) [Submitted on 2 Mar 2026] Title:Nano-EmoX: Unifying Multimodal Emotional Intelligence from Perception to Empathy Authors:Jiahao Huang, Fengyan Lin, Xuechao Yang, Chen Feng, Kexin Zhu, Xu Yang, Zhide Chen View a PDF of the paper titled Nano-EmoX: Unifying Multimodal Emotional Intelligence from Perception to Empathy, by Jiahao Huang and 6 other authors View PDF HTML (experimental) Abstract:The development of affective multimodal language models (MLMs) has long been constrained by a gap between low-level perception and high-level interaction, leading to fragmented affective capabilities and limited generalization. To bridge this gap, we propose a cognitively inspired three-level hierarchy that organizes affective tasks according to their cognitive depth-perception, understanding, and interaction-and provides a unified conceptual foundation for advancing affective modeling. Guided by this hierarchy, we introduce Nano-EmoX, a small-scale multitask MLM, and P2E (Perception-to-Empathy), a curriculum-based training framework. Nano-EmoX integrates a suite of omni-modal encoders, including an enhanced facial encoder and a fusion encoder, to capture key multimodal affective cues and improve cross-task transferability. The outputs are projected into a unified language space via heterogeneous adapters, empowering a lightweight language model to tackle diverse affective tasks. Concurrently, P2E progressively cultivates emo...