Robotics Machine Learning Computer Vision

[2602.19359] Vid2Sid: Videos Can Help Close the Sim2Real Gap

arXiv - Machine Learning February 24, 2026 4 min read Article

Summary

The paper presents Vid2Sid, a novel video-driven system identification pipeline that enhances the calibration of robot simulators by analyzing sim-real video pairs to propose physics parameter updates.

Why It Matters

As robotics increasingly relies on accurate simulations, Vid2Sid addresses the critical challenge of bridging the gap between simulated and real-world performance. By providing interpretable updates to simulation parameters, it enhances the reliability of robotic systems in practical applications, which is essential for advancements in robotics and machine learning.

Key Takeaways

Vid2Sid uses video analysis to improve robot simulator calibration.
It provides interpretable reasoning for physics parameter updates.
The method outperforms traditional black-box optimizers in accuracy.
Validation shows significant improvements in parameter recovery.
Performance varies based on the quality of perception and simulator expressiveness.

Computer Science > Robotics arXiv:2602.19359 (cs) [Submitted on 22 Feb 2026] Title:Vid2Sid: Videos Can Help Close the Sim2Real Gap Authors:Kevin Qiu, Yu Zhang, Marek Cygan, Josie Hughes View a PDF of the paper titled Vid2Sid: Videos Can Help Close the Sim2Real Gap, by Kevin Qiu and 3 other authors View PDF HTML (experimental) Abstract:Calibrating a robot simulator's physics parameters (friction, damping, material stiffness) to match real hardware is often done by hand or with black-box optimizers that reduce error but cannot explain which physical discrepancies drive the error. When sensing is limited to external cameras, the problem is further compounded by perception noise and the absence of direct force or state measurements. We present Vid2Sid, a video-driven system identification pipeline that couples foundation-model perception with a VLM-in-the-loop optimizer that analyzes paired sim-real videos, diagnoses concrete mismatches, and proposes physics parameter updates with natural language rationales. We evaluate our approach on a tendon-actuated finger (rigid-body dynamics in MuJoCo) and a deformable continuum tentacle (soft-body dynamics in PyElastica). On sim2real holdout controls unseen during training, Vid2Sid achieves the best average rank across all settings, matching or exceeding black-box optimizers while uniquely providing interpretable reasoning at each iteration. Sim2sim validation confirms that Vid2Sid recovers ground-truth parameters most accurately (mean...

Read Original Article

[2602.19359] Vid2Sid: Videos Can Help Close the Sim2Real Gap

Summary

Why It Matters

Key Takeaways

Related Articles

[2601.07855] RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

[2502.00262] INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation

[2508.00500] ProbGuard: Probabilistic Runtime Monitoring for LLM Agent Safety

[2603.26660] Ruka-v2: Tendon Driven Open-Source Dexterous Hand with Wrist and Abduction for Robot Learning

No comments

Stay updated with AI News