[2511.19797] Terminal Velocity Matching

[2511.19797] Terminal Velocity Matching

arXiv - AI 3 min read Article

Summary

The paper introduces Terminal Velocity Matching (TVM), a novel approach to generative modeling that enhances performance in one- and few-step scenarios by modeling transitions between diffusion timesteps.

Why It Matters

TVM addresses limitations in current generative models by providing a framework that ensures high fidelity in data generation. This advancement is crucial for applications in machine learning and computer vision, where efficient and accurate generative models are increasingly in demand.

Key Takeaways

  • TVM generalizes flow matching for improved generative modeling.
  • It models transitions between diffusion timesteps, enhancing fidelity.
  • The method achieves state-of-the-art performance on ImageNet datasets.
  • Architectural changes make TVM efficient for training with transformers.
  • TVM provides an upper bound on the Wasserstein distance for model distributions.

Computer Science > Machine Learning arXiv:2511.19797 (cs) [Submitted on 24 Nov 2025 (v1), last revised 16 Feb 2026 (this version, v3)] Title:Terminal Velocity Matching Authors:Linqi Zhou, Mathias Parger, Ayaan Haque, Jiaming Song View a PDF of the paper titled Terminal Velocity Matching, by Linqi Zhou and 3 other authors View PDF HTML (experimental) Abstract:We propose Terminal Velocity Matching (TVM), a generalization of flow matching that enables high-fidelity one- and few-step generative modeling. TVM models the transition between any two diffusion timesteps and regularizes its behavior at its terminal time rather than at the initial time. We prove that TVM provides an upper bound on the $2$-Wasserstein distance between data and model distributions when the model is Lipschitz continuous. However, since Diffusion Transformers lack this property, we introduce minimal architectural changes that achieve stable, single-stage training. To make TVM efficient in practice, we develop a fused attention kernel that supports backward passes on Jacobian-Vector Products, which scale well with transformer architectures. On ImageNet-256x256, TVM achieves 3.29 FID with a single function evaluation (NFE) and 1.99 FID with 4 NFEs. It similarly achieves 4.32 1-NFE FID and 2.94 4-NFE FID on ImageNet-512x512, representing state-of-the-art performance for one/few-step models from scratch. Comments: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Patter...

Related Articles

Llms

[R] Depth-first pruning transfers: GPT-2 → TinyLlama with stable gains and minimal loss

TL;DR: Removing the right layers (instead of shrinking all layers) makes transformer models ~8–12% smaller with only ~6–8% quality loss, ...

Reddit - Machine Learning · 1 min ·
Llms

Built a training stability monitor that detects instability before your loss curve shows anything — open sourced the core today

Been working on a weight divergence trajectory curvature approach to detecting neural network training instability. Treats weight updates...

Reddit - Artificial Intelligence · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime