[2602.13332] MedScope: Incentivizing "Think with Videos" for Clinical Reasoning via Coarse-to-Fine Tool Calling

[2602.13332] MedScope: Incentivizing "Think with Videos" for Clinical Reasoning via Coarse-to-Fine Tool Calling

arXiv - AI 4 min read Article

Summary

The paper presents MedScope, a clinical video reasoning model that enhances decision-making in medical contexts by integrating tool use and evidence verification.

Why It Matters

MedScope addresses the limitations of current multimodal large language models in processing clinical videos, which is crucial for improving accuracy in medical AI applications. By providing a method for more reliable evidence-based reasoning, this research could significantly impact surgical robotics and clinical decision-making.

Key Takeaways

  • MedScope improves clinical video reasoning through coarse-to-fine tool calling.
  • The model enhances accuracy by grounding predictions in temporally localized evidence.
  • ClinVideoSuite provides high-fidelity supervision for training the model.
  • MedScope achieves state-of-the-art performance on video understanding benchmarks.
  • The approach paves the way for more reliable medical AI agents.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.13332 (cs) [Submitted on 11 Feb 2026] Title:MedScope: Incentivizing "Think with Videos" for Clinical Reasoning via Coarse-to-Fine Tool Calling Authors:Wenjie Li, Yujie Zhang, Haoran Sun, Xingqi He, Hongcheng Gao, Chenglong Ma, Ming Hu, Guankun Wang, Shiyi Yao, Renhao Yang, Hongliang Ren, Lei Wang, Junjun He, Yankai Jiang View a PDF of the paper titled MedScope: Incentivizing "Think with Videos" for Clinical Reasoning via Coarse-to-Fine Tool Calling, by Wenjie Li and 13 other authors View PDF HTML (experimental) Abstract:Long-form clinical videos are central to visual evidence-based decision-making, with growing importance for applications such as surgical robotics and related settings. However, current multimodal large language models typically process videos with passive sampling or weakly grounded inspection, which limits their ability to iteratively locate, verify, and justify predictions with temporally targeted evidence. To close this gap, we propose MedScope, a tool-using clinical video reasoning model that performs coarse-to-fine evidence seeking over long-form procedures. By interleaving intermediate reasoning with targeted tool calls and verification on retrieved observations, MedScope produces more accurate and trustworthy predictions that are explicitly grounded in temporally localized visual evidence. To address the lack of high-fidelity supervision, we build ClinVideoSuite, an evidence-centr...

Related Articles

Llms

Have Companies Began Adopting Claude Co-Work at an Enterprise Level?

Hi Guys, My company is considering purchasing the Claude Enterprise plan. The main two constraints are: - Being able to block usage of Cl...

Reddit - Artificial Intelligence · 1 min ·
Llms

What I learned about multi-agent coordination running 9 specialized Claude agents

I've been experimenting with multi-agent AI systems and ended up building something more ambitious than I originally planned: a fully ope...

Reddit - Artificial Intelligence · 1 min ·
Llms

[D] The problem with comparing AI memory system benchmarks — different evaluation methods make scores meaningless

I've been reviewing how various AI memory systems evaluate their performance and noticed a fundamental issue with cross-system comparison...

Reddit - Machine Learning · 1 min ·
Shifting to AI model customization is an architectural imperative | MIT Technology Review
Llms

Shifting to AI model customization is an architectural imperative | MIT Technology Review

In the early days of large language models (LLMs), we grew accustomed to massive 10x jumps in reasoning and coding capability with every ...

MIT Technology Review · 6 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime