[2602.21531] LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies

[2602.21531] LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies

arXiv - Machine Learning 4 min read Article

Summary

The paper introduces LiLo-VLA, a modular framework for long-horizon manipulation in robotics, enhancing performance through object-centric policies and robust failure recovery.

Why It Matters

As robots increasingly operate in unstructured environments, mastering long-horizon manipulation is crucial. LiLo-VLA addresses the challenges of sequencing skills and environmental sensitivity, offering a promising solution for general-purpose robotics, which could lead to more adaptable and efficient robotic systems.

Key Takeaways

  • LiLo-VLA enables zero-shot generalization to new long-horizon tasks.
  • The framework decouples transport and interaction for enhanced robustness.
  • Achieves a 69% success rate in simulations and 85% in real-world tasks.
  • Modularity allows for dynamic replanning and effective failure recovery.
  • Outperforms existing models like Pi0.5 and OpenVLA-OFT significantly.

Computer Science > Robotics arXiv:2602.21531 (cs) [Submitted on 25 Feb 2026] Title:LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies Authors:Yue Yang, Shuo Cheng, Yu Fang, Homanga Bharadhwaj, Mingyu Ding, Gedas Bertasius, Daniel Szafir View a PDF of the paper titled LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies, by Yue Yang and 6 other authors View PDF HTML (experimental) Abstract:General-purpose robots must master long-horizon manipulation, defined as tasks involving multiple kinematic structure changes (e.g., attaching or detaching objects) in unstructured environments. While Vision-Language-Action (VLA) models offer the potential to master diverse atomic skills, they struggle with the combinatorial complexity of sequencing them and are prone to cascading failures due to environmental sensitivity. To address these challenges, we propose LiLo-VLA (Linked Local VLA), a modular framework capable of zero-shot generalization to novel long-horizon tasks without ever being trained on them. Our approach decouples transport from interaction: a Reaching Module handles global motion, while an Interaction Module employs an object-centric VLA to process isolated objects of interest, ensuring robustness against irrelevant visual features and invariance to spatial configurations. Crucially, this modularity facilitates robust failure recovery through dynamic replanning and skill reuse, effectively mitigating the c...

Related Articles

Machine Learning

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

We built an open-source prototype that applies Unix philosophy to retrieval pipelines. Each stage (PII redaction, chunking, dedup, embedd...

Reddit - Machine Learning · 1 min ·
Machine Learning

Making an AI native sovereign computational stack

I’ve been working on a personal project that ended up becoming a kind of full computing stack: identity / trust protocol decentralized ch...

Reddit - Artificial Intelligence · 1 min ·
Llms

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I published a paper today on something I've been calling postural manipulation. The short version: ordi...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

What tools are sr MLEs using? (clawdbot, openspec, wispr) [D]

I'm already blasting cursor, but I want to level up my output. I heard that these kind of AI tools and workflows are being asked in SF. W...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime