[2603.00108] SurgFusion-Net: Diversified Adaptive Multimodal Fusion Network for Surgical Skill Assessment
About this article
Abstract page for arXiv paper 2603.00108: SurgFusion-Net: Diversified Adaptive Multimodal Fusion Network for Surgical Skill Assessment
Computer Science > Robotics arXiv:2603.00108 (cs) [Submitted on 18 Feb 2026] Title:SurgFusion-Net: Diversified Adaptive Multimodal Fusion Network for Surgical Skill Assessment Authors:Runlong He, Freweini M. Tesfai, Matthew W. E. Boal, Nazir Sirajudeen, Dimitrios Anastasiou, Jialang Xu, Mobarak I. Hoque, Philip J. Edwards, John D. Kelly, Ashwin Sridhar, Abdolrahim Kadkhodamohammadi, Dhivya Chandrasekaran, Matthew J. Clarkson, Danail Stoyanov, Nader Francis, Evangelos B. Mazomenos View a PDF of the paper titled SurgFusion-Net: Diversified Adaptive Multimodal Fusion Network for Surgical Skill Assessment, by Runlong He and 15 other authors View PDF HTML (experimental) Abstract:Robotic-assisted surgery (RAS) is established in clinical practice, and automated surgical skill assessment utilizing multimodal data offers transformative potential for surgical analytics and education. However, developing effective multimodal methods remains challenging due to the task complexity, limited annotated datasets and insufficient techniques for cross-modal information fusion. Existing state-of-the-art relies exclusively on RGB video and only applies on dry-lab settings, failing to address the significant domain gap between controlled simulation and real clinical cases, where the surgical environment together with camera and tissue motion introduce substantial complexities. This work introduces SurgFusion-Net and Divergence Regulated Attention (DRA), an innovative fusion strategy for multimo...