Machine Learning Generative Ai Computer Vision Ai Agents

[2509.24526] CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models

arXiv - Machine Learning February 24, 2026 4 min read Article

Summary

The paper introduces Consistency Mid-Training (CMT), a novel method for enhancing the efficiency of training flow map models, achieving state-of-the-art results with significantly reduced resource requirements.

Why It Matters

CMT addresses key challenges in training flow map models, such as instability and high resource consumption. By providing a more efficient training framework, it has the potential to accelerate advancements in computer vision and machine learning applications, making cutting-edge techniques more accessible.

Key Takeaways

CMT introduces a mid-training phase that stabilizes the training of flow map models.
The method significantly reduces the amount of training data and GPU time needed.
CMT achieves state-of-the-art FID scores on popular datasets like CIFAR-10 and ImageNet.
The approach simplifies the learning process for flow map models, enhancing convergence speed.
CMT is positioned as a general framework applicable to various flow map training scenarios.

Computer Science > Computer Vision and Pattern Recognition arXiv:2509.24526 (cs) [Submitted on 29 Sep 2025 (v1), last revised 22 Feb 2026 (this version, v2)] Title:CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models Authors:Zheyuan Hu, Chieh-Hsin Lai, Yuki Mitsufuji, Stefano Ermon View a PDF of the paper titled CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models, by Zheyuan Hu and 3 other authors View PDF HTML (experimental) Abstract:Flow map models such as Consistency Models (CM) and Mean Flow (MF) enable few-step generation by learning the long jump of the ODE solution of diffusion models, yet training remains unstable, sensitive to hyperparameters, and costly. Initializing from a pre-trained diffusion model helps, but still requires converting infinitesimal steps into a long-jump map, leaving instability unresolved. We introduce mid-training, the first concept and practical method that inserts a lightweight intermediate stage between the (diffusion) pre-training and the final flow map training (i.e., post-training) for vision generation. Concretely, Consistency Mid-Training (CMT) is a compact and principled stage that trains a model to map points along a solver trajectory from a pre-trained model, starting from a prior sample, directly to the solver-generated clean sample. It yields a trajectory-consistent and stable initialization. This initializer outperforms random and diffusion-based baselines a...

Read Original Article

[2509.24526] CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models

Summary

Why It Matters

Key Takeaways

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence

[2603.23899] SM-Net: Learning a Continuous Spectral Manifold from Multiple Stellar Libraries

[2603.16629] MLLM-based Textual Explanations for Face Comparison

[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

No comments

Stay updated with AI News