[2602.20636] SurgAtt-Tracker: Online Surgical Attention Tracking via Temporal Proposal Reranking and Motion-Aware Refinement

[2602.20636] SurgAtt-Tracker: Online Surgical Attention Tracking via Temporal Proposal Reranking and Motion-Aware Refinement

arXiv - AI 4 min read Article

Summary

The paper presents SurgAtt-Tracker, a novel framework for online surgical attention tracking that enhances minimally invasive surgery through advanced spatio-temporal learning techniques.

Why It Matters

This research addresses critical challenges in surgical procedures by improving field-of-view guidance, which is essential for safety and efficiency. The introduction of a large-scale benchmark, SurgAtt-1.16M, further supports the development of robust attention tracking systems in surgical settings.

Key Takeaways

  • SurgAtt-Tracker models surgical attention as a dense heatmap for better FoV guidance.
  • The framework utilizes temporal coherence through proposal-level reranking and motion-aware refinement.
  • SurgAtt-1.16M benchmark facilitates systematic training and evaluation of attention tracking.
  • Extensive experiments demonstrate state-of-the-art performance under challenging conditions.
  • The approach can enhance robotic FoV planning and automatic camera control in surgeries.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.20636 (cs) [Submitted on 24 Feb 2026] Title:SurgAtt-Tracker: Online Surgical Attention Tracking via Temporal Proposal Reranking and Motion-Aware Refinement Authors:Rulin Zhou, Guankun Wang, An Wang, Yujie Ma, Lixin Ouyang, Bolin Cui, Junyan Li, Chaowei Zhu, Mingyang Li, Ming Chen, Xiaopin Zhong, Peng Lu, Jiankun Wang, Xianming Liu, Hongliang Ren View a PDF of the paper titled SurgAtt-Tracker: Online Surgical Attention Tracking via Temporal Proposal Reranking and Motion-Aware Refinement, by Rulin Zhou and 14 other authors View PDF HTML (experimental) Abstract:Accurate and stable field-of-view (FoV) guidance is critical for safe and efficient minimally invasive surgery, yet existing approaches often conflate visual attention estimation with downstream camera control or rely on direct object-centric assumptions. In this work, we formulate surgical attention tracking as a spatio-temporal learning problem and model surgeon focus as a dense attention heatmap, enabling continuous and interpretable frame-wise FoV guidance. We propose SurgAtt-Tracker, a holistic framework that robustly tracks surgical attention by exploiting temporal coherence through proposal-level reranking and motion-aware refinement, rather than direct regression. To support systematic training and evaluation, we introduce SurgAtt-1.16M, a large-scale benchmark with a clinically grounded annotation protocol that enables comprehensive heatmap-...

Related Articles

[2603.18940] Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought
Llms

[2603.18940] Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought

Abstract page for arXiv paper 2603.18940: Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty ...

arXiv - Machine Learning · 3 min ·
[2512.20620] Uncovering Patterns of Brain Activity from EEG Data Consistently Associated with Cybersickness Using Neural Network Interpretability Maps
Machine Learning

[2512.20620] Uncovering Patterns of Brain Activity from EEG Data Consistently Associated with Cybersickness Using Neural Network Interpretability Maps

Abstract page for arXiv paper 2512.20620: Uncovering Patterns of Brain Activity from EEG Data Consistently Associated with Cybersickness ...

arXiv - Machine Learning · 4 min ·
[2512.13607] Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Machine Learning

[2512.13607] Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Abstract page for arXiv paper 2512.13607: Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

arXiv - Machine Learning · 4 min ·
[2512.02650] Hear What Matters! Text-conditioned Selective Video-to-Audio Generation
Machine Learning

[2512.02650] Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

Abstract page for arXiv paper 2512.02650: Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

arXiv - Machine Learning · 3 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime