Machine Learning Generative Ai Ai Infrastructure Computer Vision

[2602.02958] Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization

arXiv - Machine Learning February 19, 2026 4 min read Article

Summary

The paper presents Quant VideoGen, a framework for autoregressive long video generation that addresses the limitations of KV cache memory, enhancing efficiency and quality in video generation.

Why It Matters

As video generation technology advances, the ability to efficiently manage memory usage while maintaining high-quality output is crucial. This research introduces a novel approach that significantly reduces memory requirements, making long video generation more feasible on standard hardware, which could accelerate the adoption of such technologies in various applications.

Key Takeaways

Quant VideoGen reduces KV cache memory usage by up to 7 times.
The framework maintains high video generation quality with less than 4% latency overhead.
Introduces Semantic Aware Smoothing and Progressive Residual Quantization techniques.
Establishes a new balance between memory efficiency and video quality.
Addresses a critical bottleneck in deploying autoregressive video generation models.

Computer Science > Machine Learning arXiv:2602.02958 (cs) [Submitted on 3 Feb 2026 (v1), last revised 17 Feb 2026 (this version, v2)] Title:Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization Authors:Haocheng Xi, Shuo Yang, Yilong Zhao, Muyang Li, Han Cai, Xingyang Li, Yujun Lin, Zhuoyang Zhang, Jintao Zhang, Xiuyu Li, Zhiying Xu, Jun Wu, Chenfeng Xu, Ion Stoica, Song Han, Kurt Keutzer View a PDF of the paper titled Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization, by Haocheng Xi and 15 other authors View PDF HTML (experimental) Abstract:Despite rapid progress in autoregressive video diffusion, an emerging system algorithm bottleneck limits both deployability and generation capability: KV cache memory. In autoregressive video generation models, the KV cache grows with generation history and quickly dominates GPU memory, often exceeding 30 GB, preventing deployment on widely available hardware. More critically, constrained KV cache budgets restrict the effective working memory, directly degrading long horizon consistency in identity, layout, and motion. To address this challenge, we present Quant VideoGen (QVG), a training free KV cache quantization framework for autoregressive video diffusion models. QVG leverages video spatiotemporal redundancy through Semantic Aware Smoothing, producing low magnitude, quantization friendly residuals. It further introduces Progressive Residual Quantization, a coarse to...

Read Original Article

[2602.02958] Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization

Summary

Why It Matters

Key Takeaways

Related Articles

Hub Group Using AI, Machine Learning for Real-Time Visibility of Shipments

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

World models will be the next big thing, bye-bye LLMs

[D] Got my first offer after months of searching — below posted range, contract-to-hire, and worried it may pause my search. Do I take it?

No comments

Stay updated with AI News