Machine Learning Robotics Ai Agents

[2602.21531] LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies

arXiv - Machine Learning February 26, 2026 4 min read Article

Summary

The paper introduces LiLo-VLA, a modular framework for long-horizon manipulation in robotics, enhancing performance through object-centric policies and robust failure recovery.

Why It Matters

As robots increasingly operate in unstructured environments, mastering long-horizon manipulation is crucial. LiLo-VLA addresses the challenges of sequencing skills and environmental sensitivity, offering a promising solution for general-purpose robotics, which could lead to more adaptable and efficient robotic systems.

Key Takeaways

LiLo-VLA enables zero-shot generalization to new long-horizon tasks.
The framework decouples transport and interaction for enhanced robustness.
Achieves a 69% success rate in simulations and 85% in real-world tasks.
Modularity allows for dynamic replanning and effective failure recovery.
Outperforms existing models like Pi0.5 and OpenVLA-OFT significantly.

Computer Science > Robotics arXiv:2602.21531 (cs) [Submitted on 25 Feb 2026] Title:LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies Authors:Yue Yang, Shuo Cheng, Yu Fang, Homanga Bharadhwaj, Mingyu Ding, Gedas Bertasius, Daniel Szafir View a PDF of the paper titled LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies, by Yue Yang and 6 other authors View PDF HTML (experimental) Abstract:General-purpose robots must master long-horizon manipulation, defined as tasks involving multiple kinematic structure changes (e.g., attaching or detaching objects) in unstructured environments. While Vision-Language-Action (VLA) models offer the potential to master diverse atomic skills, they struggle with the combinatorial complexity of sequencing them and are prone to cascading failures due to environmental sensitivity. To address these challenges, we propose LiLo-VLA (Linked Local VLA), a modular framework capable of zero-shot generalization to novel long-horizon tasks without ever being trained on them. Our approach decouples transport from interaction: a Reaching Module handles global motion, while an Interaction Module employs an object-centric VLA to process isolated objects of interest, ensuring robustness against irrelevant visual features and invariance to spatial configurations. Crucially, this modularity facilitates robust failure recovery through dynamic replanning and skill reuse, effectively mitigating the c...

Read Original Article

[2602.21531] LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies

Summary

Why It Matters

Key Takeaways

Related Articles

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

Making an AI native sovereign computational stack

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

What tools are sr MLEs using? (clawdbot, openspec, wispr) [D]

No comments

Stay updated with AI News