[2602.15872] MARVL: Multi-Stage Guidance for Robotic Manipulation via Vision-Language Models

[2602.15872] MARVL: Multi-Stage Guidance for Robotic Manipulation via Vision-Language Models

arXiv - Machine Learning 3 min read Article

Summary

The paper presents MARVL, a novel approach for robotic manipulation that utilizes Vision-Language Models (VLMs) to enhance task performance through multi-stage guidance and improved reward design.

Why It Matters

As robotics increasingly integrates AI, effective reward design is crucial for enhancing the efficiency of reinforcement learning. MARVL addresses limitations in existing VLMs, providing a scalable solution that improves task execution in robotic systems, which is essential for advancements in automation and AI-driven robotics.

Key Takeaways

  • MARVL enhances reward design for robotic manipulation using VLMs.
  • It decomposes tasks into multi-stage subtasks for better trajectory sensitivity.
  • Empirical results show MARVL outperforms existing methods on the Meta-World benchmark.
  • The approach improves sample efficiency and robustness in sparse-reward tasks.
  • MARVL addresses issues of spatial grounding and task semantics in VLMs.

Computer Science > Robotics arXiv:2602.15872 (cs) [Submitted on 28 Jan 2026] Title:MARVL: Multi-Stage Guidance for Robotic Manipulation via Vision-Language Models Authors:Xunlan Zhou, Xuanlin Chen, Shaowei Zhang, Xiangkun Li, ShengHua Wan, Xiaohai Hu, Yuan Lei, Le Gan, De-chuan Zhan View a PDF of the paper titled MARVL: Multi-Stage Guidance for Robotic Manipulation via Vision-Language Models, by Xunlan Zhou and 8 other authors View PDF HTML (experimental) Abstract:Designing dense reward functions is pivotal for efficient robotic Reinforcement Learning (RL). However, most dense rewards rely on manual engineering, which fundamentally limits the scalability and automation of reinforcement learning. While Vision-Language Models (VLMs) offer a promising path to reward design, naive VLM rewards often misalign with task progress, struggle with spatial grounding, and show limited understanding of task semantics. To address these issues, we propose MARVL-Multi-stAge guidance for Robotic manipulation via Vision-Language models. MARVL fine-tunes a VLM for spatial and semantic consistency and decomposes tasks into multi-stage subtasks with task direction projection for trajectory sensitivity. Empirically, MARVL significantly outperforms existing VLM-reward methods on the Meta-World benchmark, demonstrating superior sample efficiency and robustness on sparse-reward manipulation tasks. Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG) ...

Related Articles

Llms

[D] Howcome Muon is only being used for Transformers?

Muon has quickly been adopted in LLM training, yet we don't see it being talked about in other contexts. Searches for Muon on ConvNets tu...

Reddit - Machine Learning · 1 min ·
Llms

[P] I trained a language model from scratch for a low resource language and got it running fully on-device on Android (no GPU, demo)

Hi Everybody! I just wanted to share an update on a project I’ve been working on called BULaMU, a family of language models trained (20M,...

Reddit - Machine Learning · 1 min ·
Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users
Llms

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users

A study found that sycophancy is pervasive among chatbots, and that bots are more likely than human peers to affirm a person's bad behavior.

AI Tools & Products · 6 min ·
Popular AI gateway startup LiteLLM ditches controversial startup Delve | TechCrunch
Llms

Popular AI gateway startup LiteLLM ditches controversial startup Delve | TechCrunch

LiteLLM had obtained two security compliance certifications via Delve and fell victim to some horrific credential-stealing malware last w...

TechCrunch - AI · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime