[2602.02958] Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization

[2602.02958] Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization

arXiv - Machine Learning 4 min read Article

Summary

The paper presents Quant VideoGen, a framework for autoregressive long video generation that addresses the limitations of KV cache memory, enhancing efficiency and quality in video generation.

Why It Matters

As video generation technology advances, the ability to efficiently manage memory usage while maintaining high-quality output is crucial. This research introduces a novel approach that significantly reduces memory requirements, making long video generation more feasible on standard hardware, which could accelerate the adoption of such technologies in various applications.

Key Takeaways

  • Quant VideoGen reduces KV cache memory usage by up to 7 times.
  • The framework maintains high video generation quality with less than 4% latency overhead.
  • Introduces Semantic Aware Smoothing and Progressive Residual Quantization techniques.
  • Establishes a new balance between memory efficiency and video quality.
  • Addresses a critical bottleneck in deploying autoregressive video generation models.

Computer Science > Machine Learning arXiv:2602.02958 (cs) [Submitted on 3 Feb 2026 (v1), last revised 17 Feb 2026 (this version, v2)] Title:Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization Authors:Haocheng Xi, Shuo Yang, Yilong Zhao, Muyang Li, Han Cai, Xingyang Li, Yujun Lin, Zhuoyang Zhang, Jintao Zhang, Xiuyu Li, Zhiying Xu, Jun Wu, Chenfeng Xu, Ion Stoica, Song Han, Kurt Keutzer View a PDF of the paper titled Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization, by Haocheng Xi and 15 other authors View PDF HTML (experimental) Abstract:Despite rapid progress in autoregressive video diffusion, an emerging system algorithm bottleneck limits both deployability and generation capability: KV cache memory. In autoregressive video generation models, the KV cache grows with generation history and quickly dominates GPU memory, often exceeding 30 GB, preventing deployment on widely available hardware. More critically, constrained KV cache budgets restrict the effective working memory, directly degrading long horizon consistency in identity, layout, and motion. To address this challenge, we present Quant VideoGen (QVG), a training free KV cache quantization framework for autoregressive video diffusion models. QVG leverages video spatiotemporal redundancy through Semantic Aware Smoothing, producing low magnitude, quantization friendly residuals. It further introduces Progressive Residual Quantization, a coarse to...

Related Articles

Hub Group Using AI, Machine Learning for Real-Time Visibility of Shipments
Machine Learning

Hub Group Using AI, Machine Learning for Real-Time Visibility of Shipments

AI Events · 4 min ·
Llms

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

Greetings all - I've posted mostly in r/claudecode and r/aigamedev a couple of times previously. Working with CC for personal projects re...

Reddit - Artificial Intelligence · 1 min ·
Llms

World models will be the next big thing, bye-bye LLMs

Was at Nvidia's GTC conference recently and honestly, it was one of the most eye-opening events I've attended in a while. There was a lot...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] Got my first offer after months of searching — below posted range, contract-to-hire, and worried it may pause my search. Do I take it?

I could really use some outside perspective. I’m a senior ML/CV engineer in Canada with about 5–6 years across research and industry. Mas...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime