[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2512.00408: Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

Computer Science > Computer Vision and Pattern Recognition arXiv:2512.00408 (cs) [Submitted on 29 Nov 2025 (v1), last revised 6 Apr 2026 (this version, v2)] Title:Low-Bitrate Video Compression through Semantic-Conditioned Diffusion Authors:Lingdong Wang, Guan-Ming Su, Divya Kothandaraman, Tsung-Wei Huang, Mohammad Hajiesmaili, Ramesh K. Sitaraman View a PDF of the paper titled Low-Bitrate Video Compression through Semantic-Conditioned Diffusion, by Lingdong Wang and 5 other authors View PDF HTML (experimental) Abstract:Traditional video codecs optimized for pixel fidelity collapse at ultra-low bitrates and produce severe artifacts. This failure arises from a fundamental misalignment between pixel accuracy and human perception. We propose a semantic video compression framework named DiSCo that transmits only the most meaningful information while relying on generative priors for detail synthesis. The source video is decomposed into three compact modalities: a textual description, a spatiotemporally degraded video, and optional sketches or poses that respectively capture semantic, appearance, and motion cues. A conditional video diffusion model then reconstructs high-quality, temporally coherent videos from these compact representations. Temporal forward filling, token interleaving, and modality-specific codecs are proposed to improve multimodal generation and modality compactness. Experiments show that our method outperforms baseline semantic and traditional codecs by 2-10X ...

Originally published on April 07, 2026. Curated by AI News.

Related Articles

Generative Ai

Will Generative AI apps remain a revenue powerhouse in 2026?

AI Tools & Products · 1 min ·
[2601.08565] Rewriting Video: Text-Driven Reauthoring of Video Footage
Machine Learning

[2601.08565] Rewriting Video: Text-Driven Reauthoring of Video Footage

Abstract page for arXiv paper 2601.08565: Rewriting Video: Text-Driven Reauthoring of Video Footage

arXiv - AI · 3 min ·
[2512.18388] Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creation with Generative Models
Machine Learning

[2512.18388] Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creation with Generative Models

Abstract page for arXiv paper 2512.18388: Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creatio...

arXiv - AI · 4 min ·
[2512.08477] ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Aligned Attention
Generative Ai

[2512.08477] ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Aligned Attention

Abstract page for arXiv paper 2512.08477: ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Positi...

arXiv - AI · 4 min ·
More in Generative Ai: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime