[2602.16110] OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis

[2602.16110] OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis

arXiv - AI 4 min read Article

Summary

The paper presents OmniCT, a unified slice-volume large vision-language model (LVLM) designed for comprehensive CT analysis, addressing limitations in existing models by enhancing spatial consistency and semantic alignment.

Why It Matters

OmniCT represents a significant advancement in medical imaging analysis, as it integrates slice and volumetric data to improve diagnostic accuracy. This unified approach could lead to better clinical outcomes by providing more reliable interpretations of CT scans, which are critical in diagnosing various conditions.

Key Takeaways

  • OmniCT enhances spatial consistency in CT analysis through innovative modeling techniques.
  • The model improves organ-level semantic understanding, crucial for accurate diagnosis.
  • MedEval-CT introduces a comprehensive dataset for evaluating slice-volume models.
  • OmniCT outperforms existing methods across diverse clinical tasks.
  • The research establishes a new paradigm for cross-modal understanding in medical imaging.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.16110 (cs) [Submitted on 18 Feb 2026] Title:OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis Authors:Tianwei Lin, Zhongwei Qiu, Wenqiao Zhang, Jiang Liu, Yihan Xie, Mingjian Gao, Zhenxuan Fan, Zhaocheng Li, Sijing Li, Zhongle Xie, Peng LU, Yueting Zhuang, Yingda Xia, Ling Zhang, Beng Chin Ooi View a PDF of the paper titled OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis, by Tianwei Lin and 14 other authors View PDF HTML (experimental) Abstract:Computed Tomography (CT) is one of the most widely used and diagnostically information-dense imaging modalities, covering critical organs such as the heart, lungs, liver, and colon. Clinical interpretation relies on both slice-driven local features (e.g., sub-centimeter nodules, lesion boundaries) and volume-driven spatial representations (e.g., tumor infiltration, inter-organ anatomical relations). However, existing Large Vision-Language Models (LVLMs) remain fragmented in CT slice versus volumetric understanding: slice-driven LVLMs show strong generalization but lack cross-slice spatial consistency, while volume-driven LVLMs explicitly capture volumetric semantics but suffer from coarse granularity and poor compatibility with slice inputs. The absence of a unified modeling paradigm constitutes a major bottleneck for the clinical translation of medical LVLMs. We present OmniCT, a powerful unified slice-volume LVLM...

Related Articles

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·
[2603.26292] findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding
Llms

[2603.26292] findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding

Abstract page for arXiv paper 2603.26292: findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding

arXiv - AI · 3 min ·
More in Computer Vision: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime