[2602.20119] NovaPlan: Zero-Shot Long-Horizon Manipulation via Closed-Loop Video Language Planning

[2602.20119] NovaPlan: Zero-Shot Long-Horizon Manipulation via Closed-Loop Video Language Planning

arXiv - AI 4 min read Article

Summary

NovaPlan introduces a framework for zero-shot long-horizon manipulation in robotics, integrating video language planning with geometrically grounded execution to enhance task performance without prior training.

Why It Matters

This research addresses a significant challenge in robotics—performing complex tasks without prior demonstrations. By combining high-level semantic reasoning with low-level execution, NovaPlan enhances the capabilities of robots in real-world scenarios, potentially transforming automation in various industries.

Key Takeaways

  • NovaPlan enables robots to perform long-horizon tasks with zero-shot learning.
  • The framework integrates video language models with closed-loop execution for improved task management.
  • Robots can autonomously recover from errors during task execution, enhancing reliability.
  • Utilizes both object keypoints and human hand poses to inform robot actions.
  • Demonstrated effectiveness on complex assembly tasks and the Functional Manipulation Benchmark.

Computer Science > Robotics arXiv:2602.20119 (cs) [Submitted on 23 Feb 2026] Title:NovaPlan: Zero-Shot Long-Horizon Manipulation via Closed-Loop Video Language Planning Authors:Jiahui Fu, Junyu Nan, Lingfeng Sun, Hongyu Li, Jianing Qian, Jennifer L. Barry, Kris Kitani, George Konidaris View a PDF of the paper titled NovaPlan: Zero-Shot Long-Horizon Manipulation via Closed-Loop Video Language Planning, by Jiahui Fu and 7 other authors View PDF HTML (experimental) Abstract:Solving long-horizon tasks requires robots to integrate high-level semantic reasoning with low-level physical interaction. While vision-language models (VLMs) and video generation models can decompose tasks and imagine outcomes, they often lack the physical grounding necessary for real-world execution. We introduce NovaPlan, a hierarchical framework that unifies closed-loop VLM and video planning with geometrically grounded robot execution for zero-shot long-horizon manipulation. At the high level, a VLM planner decomposes tasks into sub-goals and monitors robot execution in a closed loop, enabling the system to recover from single-step failures through autonomous re-planning. To compute low-level robot actions, we extract and utilize both task-relevant object keypoints and human hand poses as kinematic priors from the generated videos, and employ a switching mechanism to choose the better one as a reference for robot actions, maintaining stable execution even under heavy occlusion or depth inaccuracy. We ...

Related Articles

Llms

Why are we blindly trusting AI companies with our data?

Lately I’ve been seeing a story floating around that really made me pause. Apparently, there were claims that the US government asked Ant...

Reddit - Artificial Intelligence · 1 min ·
De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV
Llms

De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV

Artificial intelligence is transforming every corner of industry, and television is no exception. Major networks in Korea have recently a...

AI Tools & Products · 4 min ·
[2603.16629] MLLM-based Textual Explanations for Face Comparison
Llms

[2603.16629] MLLM-based Textual Explanations for Face Comparison

Abstract page for arXiv paper 2603.16629: MLLM-based Textual Explanations for Face Comparison

arXiv - AI · 4 min ·
[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation
Llms

[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

Abstract page for arXiv paper 2603.15159: To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime