[2602.19359] Vid2Sid: Videos Can Help Close the Sim2Real Gap
Summary
The paper presents Vid2Sid, a novel video-driven system identification pipeline that enhances the calibration of robot simulators by analyzing sim-real video pairs to propose physics parameter updates.
Why It Matters
As robotics increasingly relies on accurate simulations, Vid2Sid addresses the critical challenge of bridging the gap between simulated and real-world performance. By providing interpretable updates to simulation parameters, it enhances the reliability of robotic systems in practical applications, which is essential for advancements in robotics and machine learning.
Key Takeaways
- Vid2Sid uses video analysis to improve robot simulator calibration.
- It provides interpretable reasoning for physics parameter updates.
- The method outperforms traditional black-box optimizers in accuracy.
- Validation shows significant improvements in parameter recovery.
- Performance varies based on the quality of perception and simulator expressiveness.
Computer Science > Robotics arXiv:2602.19359 (cs) [Submitted on 22 Feb 2026] Title:Vid2Sid: Videos Can Help Close the Sim2Real Gap Authors:Kevin Qiu, Yu Zhang, Marek Cygan, Josie Hughes View a PDF of the paper titled Vid2Sid: Videos Can Help Close the Sim2Real Gap, by Kevin Qiu and 3 other authors View PDF HTML (experimental) Abstract:Calibrating a robot simulator's physics parameters (friction, damping, material stiffness) to match real hardware is often done by hand or with black-box optimizers that reduce error but cannot explain which physical discrepancies drive the error. When sensing is limited to external cameras, the problem is further compounded by perception noise and the absence of direct force or state measurements. We present Vid2Sid, a video-driven system identification pipeline that couples foundation-model perception with a VLM-in-the-loop optimizer that analyzes paired sim-real videos, diagnoses concrete mismatches, and proposes physics parameter updates with natural language rationales. We evaluate our approach on a tendon-actuated finger (rigid-body dynamics in MuJoCo) and a deformable continuum tentacle (soft-body dynamics in PyElastica). On sim2real holdout controls unseen during training, Vid2Sid achieves the best average rank across all settings, matching or exceeding black-box optimizers while uniquely providing interpretable reasoning at each iteration. Sim2sim validation confirms that Vid2Sid recovers ground-truth parameters most accurately (mean...