[2601.22002] Rate-Distortion Optimization for Transformer Inference

arXiv - Machine Learning April 03, 2026 3 min read

About this article

Abstract page for arXiv paper 2601.22002: Rate-Distortion Optimization for Transformer Inference

Computer Science > Machine Learning arXiv:2601.22002 (cs) [Submitted on 29 Jan 2026 (v1), last revised 1 Apr 2026 (this version, v2)] Title:Rate-Distortion Optimization for Transformer Inference Authors:Anderson de Andrade, Alon Harell, Ivan V. Bajić View a PDF of the paper titled Rate-Distortion Optimization for Transformer Inference, by Anderson de Andrade and 2 other authors View PDF HTML (experimental) Abstract:Transformers achieve superior performance on many tasks, but impose heavy compute and memory requirements during inference. This inference can be made more efficient by partitioning the process across multiple devices, which, in turn, requires compressing its intermediate representations. We introduce a principled rate-distortion-based framework for lossy compression that learns compact encodings that explicitly trade bitrate for accuracy. Experiments on language benchmarks show that the simplest of the proposed codecs achieves substantial rate savings, outperforming more complex methods. We characterize and analyze the rate-distortion behaviour of transformers, offering a unified lens for understanding performance in representation coding. This formulation extends information-theoretic concepts to define the gap between rate and entropy, and derive some of its bounds. We further develop probably approximately correct (PAC)-style bounds for estimating this gap. For different architectures and tasks, we empirically demonstrate that their rates are driven by these...

Originally published on April 03, 2026. Curated by AI News.

Machine Learning

How do you anonymize code for a conference submission? [D]

Hi everyone, I have a question about anonymizing code for conference submissions. I’m submitting an AI/ML paper to a conference and would...

Reddit - Machine Learning · 1 min · 28 minutes ago

Machine Learning

Now Meta will track what employees do on their computers to train its AI agents | The Verge

Meta is reportedly using tracking software to record its employees’ mouse and keyboard activity for training data for its AI agents.

The Verge - AI · 4 min · about 2 hours ago

Llms

Training-time intervention yields 63.4% blind-pair human preference at matched val-loss (1.2B params, 320 judgments, p = 1.98 × 10⁻⁵) [R]

TL;DR. I ran a blind A/B preference evaluation between two 1.2B-parameter LMs trained on identical data (same order, same seed, 30K steps...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

I can't believe text normalization is so underdiscussed in streaming text-to-speech [D]

Kinda suprises me how little discussion there is around about mistakes in streaming TTS models People look for natural readers, high voic...

Reddit - Machine Learning · 1 min · about 4 hours ago

[2601.22002] Rate-Distortion Optimization for Transformer Inference

About this article

Related Articles

How do you anonymize code for a conference submission? [D]

Now Meta will track what employees do on their computers to train its AI agents | The Verge

Training-time intervention yields 63.4% blind-pair human preference at matched val-loss (1.2B params, 320 judgments, p = 1.98 × 10⁻⁵) [R]

I can't believe text normalization is so underdiscussed in streaming text-to-speech [D]

No comments

Stay updated with AI News