[2603.26738] SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model
About this article
Abstract page for arXiv paper 2603.26738: SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.26738 (cs) [Submitted on 22 Mar 2026] Title:SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model Authors:Guifeng Deng, Pan Wang, Jiquan Wang, Shuying Rao, Junyi Xie, Wanjun Guo, Tao Li, Haiteng Jiang View a PDF of the paper titled SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model, by Guifeng Deng and 7 other authors View PDF Abstract:While automated sleep staging has achieved expert-level accuracy, its clinical adoption is hindered by a lack of auditable reasoning. We introduce SleepVLM, a rule-grounded vision-language model (VLM) designed to stage sleep from multi-channel polysomnography (PSG) waveform images while generating clinician-readable rationales based on American Academy of Sleep Medicine (AASM) scoring criteria. Utilizing waveform-perceptual pre-training and rule-grounded supervised fine-tuning, SleepVLM achieved Cohen's kappa scores of 0.767 on an held out test set (MASS-SS1) and 0.743 on an external cohort (ZUAMHCS), matching state-of-the-art performance. Expert evaluations further validated the quality of the model's reasoning, with mean scores exceeding 4.0/5.0 for factual accuracy, evidence comprehensiveness, and logical coherence. By coupling competitive performance with transparent, rule-based explanations, SleepVLM may improve the trustworthiness and auditability of automated sleep staging in clinical workflows. To facilitat...