[2507.13231] VITA: Vision-to-Action Flow Matching Policy

[2507.13231] VITA: Vision-to-Action Flow Matching Policy

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2507.13231: VITA: Vision-to-Action Flow Matching Policy

Computer Science > Computer Vision and Pattern Recognition arXiv:2507.13231 (cs) [Submitted on 17 Jul 2025 (v1), last revised 3 Mar 2026 (this version, v4)] Title:VITA: Vision-to-Action Flow Matching Policy Authors:Dechen Gao, Boqi Zhao, Andrew Lee, Ian Chuang, Hanchu Zhou, Hang Wang, Zhe Zhao, Junshan Zhang, Iman Soltani View a PDF of the paper titled VITA: Vision-to-Action Flow Matching Policy, by Dechen Gao and 8 other authors View PDF HTML (experimental) Abstract:Conventional flow matching and diffusion-based policies sample via iterative denoising from standard noise distributions (e.g., Gaussian), and require conditioning modules to repeatedly incorporate visual information during the generative process, incurring substantial time and memory overhead. To reduce the complexity, we develop VITA, VIsion-To-Action policy, a noise-free and conditioning-free flow matching policy learning framework that directly flows from visual representations to latent actions. Since the source of the flow is visually grounded, VITA eliminates the need for visual conditioning during generation. As expected, bridging vision and action is challenging, because actions are lower-dimensional, less structured, and sparser than visual representations; moreover, flow matching requires the source and target to have the same dimensionality. To overcome this, we introduce an action autoencoder that maps raw actions into a structured latent space aligned with visual latents, trained jointly with flo...

Originally published on March 05, 2026. Curated by AI News.

Related Articles

TikTok’s policy for AI ads isn’t working | The Verge
Generative Ai

TikTok’s policy for AI ads isn’t working | The Verge

I can’t tell whether ads on TikTok have been made with generative AI, but somebody knows for sure. They just havent been telling us.

The Verge - AI · 8 min ·
Why OpenAI killed Sora | The Verge
Llms

Why OpenAI killed Sora | The Verge

OpenAI’s video-generation AI app, Sora, is dead as of Tuesday. OpenAI said it needs to focus its existing compute on its AI agent goals a...

The Verge - AI · 10 min ·
Generative Ai

Is building an Al photo app a smart thing to do in the big 2026?

A buddy of mine runs an AI photo upgrader for dating profiles, and the backlash he gets is brutal. People call it catfishing and cheating...

Reddit - Artificial Intelligence · 1 min ·
VCs are betting billions on AI's next wave, so why is OpenAI killing Sora? | TechCrunch
Generative Ai

VCs are betting billions on AI's next wave, so why is OpenAI killing Sora? | TechCrunch

Equity breaks down why OpenAI pulled the plug on Sora, what Meta’s back-to-back legal losses mean, and more of the week's headlines.

TechCrunch - AI · 4 min ·
More in Generative Ai: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime