[2602.16110] OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis
Summary
The paper presents OmniCT, a unified slice-volume large vision-language model (LVLM) designed for comprehensive CT analysis, addressing limitations in existing models by enhancing spatial consistency and semantic alignment.
Why It Matters
OmniCT represents a significant advancement in medical imaging analysis, as it integrates slice and volumetric data to improve diagnostic accuracy. This unified approach could lead to better clinical outcomes by providing more reliable interpretations of CT scans, which are critical in diagnosing various conditions.
Key Takeaways
- OmniCT enhances spatial consistency in CT analysis through innovative modeling techniques.
- The model improves organ-level semantic understanding, crucial for accurate diagnosis.
- MedEval-CT introduces a comprehensive dataset for evaluating slice-volume models.
- OmniCT outperforms existing methods across diverse clinical tasks.
- The research establishes a new paradigm for cross-modal understanding in medical imaging.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.16110 (cs) [Submitted on 18 Feb 2026] Title:OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis Authors:Tianwei Lin, Zhongwei Qiu, Wenqiao Zhang, Jiang Liu, Yihan Xie, Mingjian Gao, Zhenxuan Fan, Zhaocheng Li, Sijing Li, Zhongle Xie, Peng LU, Yueting Zhuang, Yingda Xia, Ling Zhang, Beng Chin Ooi View a PDF of the paper titled OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis, by Tianwei Lin and 14 other authors View PDF HTML (experimental) Abstract:Computed Tomography (CT) is one of the most widely used and diagnostically information-dense imaging modalities, covering critical organs such as the heart, lungs, liver, and colon. Clinical interpretation relies on both slice-driven local features (e.g., sub-centimeter nodules, lesion boundaries) and volume-driven spatial representations (e.g., tumor infiltration, inter-organ anatomical relations). However, existing Large Vision-Language Models (LVLMs) remain fragmented in CT slice versus volumetric understanding: slice-driven LVLMs show strong generalization but lack cross-slice spatial consistency, while volume-driven LVLMs explicitly capture volumetric semantics but suffer from coarse granularity and poor compatibility with slice inputs. The absence of a unified modeling paradigm constitutes a major bottleneck for the clinical translation of medical LVLMs. We present OmniCT, a powerful unified slice-volume LVLM...