[2602.21365] Towards Controllable Video Synthesis of Routine and Rare OR Events
Summary
The paper presents a novel framework for synthesizing controlled video representations of routine and rare operating room events, addressing data challenges in AI training.
Why It Matters
This research tackles the significant challenge of curating datasets for rare and safety-critical events in operating rooms, which is crucial for developing AI systems that can enhance patient safety and operational efficiency. By enabling the synthesis of these events, the framework can facilitate better training of AI models, ultimately improving healthcare outcomes.
Key Takeaways
- Introduces a video diffusion framework for synthesizing OR events.
- Outperforms existing video diffusion models in generating realistic scenarios.
- Achieves a 70.13% recall rate in detecting near-miss safety events.
- Facilitates the creation of synthetic datasets for AI training.
- Demonstrates potential for enhancing ambient intelligence in healthcare.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.21365 (cs) [Submitted on 24 Feb 2026] Title:Towards Controllable Video Synthesis of Routine and Rare OR Events Authors:Dominik Schneider, Lalithkumar Seenivasan, Sampath Rapuri, Vishalroshan Anil, Aiza Maksutova, Yiqing Shen, Jan Emily Mangulabnan, Hao Ding, Jose L. Porras, Masaru Ishii, Mathias Unberath View a PDF of the paper titled Towards Controllable Video Synthesis of Routine and Rare OR Events, by Dominik Schneider and 10 other authors View PDF HTML (experimental) Abstract:Purpose: Curating large-scale datasets of operating room (OR) workflow, encompassing rare, safety-critical, or atypical events, remains operationally and ethically challenging. This data bottleneck complicates the development of ambient intelligence for detecting, understanding, and mitigating rare or safety-critical events in the OR. Methods: This work presents an OR video diffusion framework that enables controlled synthesis of rare and safety-critical events. The framework integrates a geometric abstraction module, a conditioning module, and a fine-tuned diffusion model to first transform OR scenes into abstract geometric representations, then condition the synthesis process, and finally generate realistic OR event videos. Using this framework, we also curate a synthetic dataset to train and validate AI models for detecting near-misses of sterile-field violations. Results: In synthesizing routine OR events, our method outperf...