[2603.15270] From Documents to Spans: Scalable Supervision for Evidence-Based ICD Coding with LLMs
About this article
Abstract page for arXiv paper 2603.15270: From Documents to Spans: Scalable Supervision for Evidence-Based ICD Coding with LLMs
Computer Science > Computation and Language arXiv:2603.15270 (cs) [Submitted on 16 Mar 2026 (v1), last revised 7 May 2026 (this version, v2)] Title:From Documents to Spans: Scalable Supervision for Evidence-Based ICD Coding with LLMs Authors:Xu Zhang, Wenxin Ma, Chenxu Wu, Rongsheng Wang, Zhiyang He, Xiaodong Tao, Kun Zhang, S. Kevin Zhou View a PDF of the paper titled From Documents to Spans: Scalable Supervision for Evidence-Based ICD Coding with LLMs, by Xu Zhang and 7 other authors View PDF HTML (experimental) Abstract:International Classification of Diseases (ICD) coding assigns diagnosis codes to clinical documents and is essential for healthcare billing and clinical analysis. Reliable coding requires that each predicted code be supported by explicit textual evidence. However, existing public datasets provide only code labels, without evidence annotations, limiting models' ability to learn evidence-grounded predictions. In this work, we argue that dense, document-level evidence annotation is not always necessary for learning evidence-based coding. Instead, models can learn code-specific evidence patterns from local spans and use these patterns to support document-level evidence-based coding. Based on this insight, we propose Span-Centric Learning (SCL), a training framework that strengthens LLMs' coding ability at the span level and transfers this capability to full clinical documents. Specifically, we use a small set of annotated documents to supervise evidence reco...