[2602.16356] Articulated 3D Scene Graphs for Open-World Mobile Manipulation

[2602.16356] Articulated 3D Scene Graphs for Open-World Mobile Manipulation

arXiv - AI 4 min read Article

Summary

This paper presents MoMa-SG, a framework for creating semantic-kinematic 3D scene graphs to enhance mobile manipulation of articulated objects in real-world environments.

Why It Matters

The ability for robots to understand and interact with articulated objects in dynamic environments is crucial for advancing robotics applications in everyday settings. This research addresses significant limitations in current mobile manipulation technologies, providing a foundation for more intelligent and adaptable robotic systems.

Key Takeaways

  • MoMa-SG framework enables robust manipulation of articulated objects.
  • Introduces the Arti4D-Semantic dataset for improved training.
  • Utilizes a novel unified twist estimation for joint parameter optimization.
  • Demonstrates real-world applicability with quadruped and mobile manipulators.
  • Addresses the gap between semantics, geometry, and kinematics in robotics.

Computer Science > Robotics arXiv:2602.16356 (cs) [Submitted on 18 Feb 2026] Title:Articulated 3D Scene Graphs for Open-World Mobile Manipulation Authors:Martin Büchner, Adrian Röfer, Tim Engelbracht, Tim Welschehold, Zuria Bauer, Hermann Blum, Marc Pollefeys, Abhinav Valada View a PDF of the paper titled Articulated 3D Scene Graphs for Open-World Mobile Manipulation, by Martin B\"uchner and 7 other authors View PDF HTML (experimental) Abstract:Semantics has enabled 3D scene understanding and affordance-driven object interaction. However, robots operating in real-world environments face a critical limitation: they cannot anticipate how objects move. Long-horizon mobile manipulation requires closing the gap between semantics, geometry, and kinematics. In this work, we present MoMa-SG, a novel framework for building semantic-kinematic 3D scene graphs of articulated scenes containing a myriad of interactable objects. Given RGB-D sequences containing multiple object articulations, we temporally segment object interactions and infer object motion using occlusion-robust point tracking. We then lift point trajectories into 3D and estimate articulation models using a novel unified twist estimation formulation that robustly estimates revolute and prismatic joint parameters in a single optimization pass. Next, we associate objects with estimated articulations and detect contained objects by reasoning over parent-child relations at identified opening states. We also introduce the nov...

Related Articles

Machine Learning

[P] Run Karpathy's Autoresearch for $0.44 instead of $24 — Open-source parallel evolution pipeline on SageMaker Spot

TL;DR: I built an open-source pipeline that runs Karpathy's autoresearch on SageMaker Spot instances — 25 autonomous ML experiments for $...

Reddit - Machine Learning · 1 min ·
Robotics

[D] Awesome AI Agent Incidents - A curated list of incidents, attack vectors, failure modes, and defensive tools for autonomous AI agents.

https://github.com/h5i-dev/awesome-ai-agent-incidents submitted by /u/Living_Impression_37 [link] [comments]

Reddit - Machine Learning · 1 min ·
Llms

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I published a paper today on something I've been calling postural manipulation. The short version: ordi...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I've been documenting what I'm calling postural manipulation: a specific class of language that install...

Reddit - Machine Learning · 1 min ·
More in Robotics: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime