[2505.19371] Foundations of Top-$k$ Decoding For Language Models

[2505.19371] Foundations of Top-$k$ Decoding For Language Models

arXiv - Machine Learning 4 min read Article

Summary

This paper presents a theoretical framework for Top-$k$ decoding in language models, explaining its efficiency and generalizing its application through Bregman decoders.

Why It Matters

Understanding Top-$k$ decoding is crucial for enhancing language model performance. This research provides a theoretical basis that could lead to improved decoding strategies, impacting various applications in natural language processing and AI.

Key Takeaways

  • Top-$k$ decoding is essential for efficient sampling in language models.
  • The paper introduces a theoretical framework that generalizes Top-$k$ decoding.
  • Bregman decoders are proposed as a method for optimizing decoding strategies.
  • The study reveals that optimal decoding strategies are greedy and can be efficiently optimized.
  • New decoding strategies are identified, offering distinct behaviors in probability re-normalization.

Computer Science > Artificial Intelligence arXiv:2505.19371 (cs) [Submitted on 25 May 2025 (v1), last revised 20 Feb 2026 (this version, v2)] Title:Foundations of Top-$k$ Decoding For Language Models Authors:Georgy Noarov, Soham Mallick, Tao Wang, Sunay Joshi, Yan Sun, Yangxinyu Xie, Mengxin Yu, Edgar Dobriban View a PDF of the paper titled Foundations of Top-$k$ Decoding For Language Models, by Georgy Noarov and 7 other authors View PDF Abstract:Top-$k$ decoding is a widely used method for sampling from LLMs: at each token, only the largest $k$ next-token-probabilities are kept, and the next token is sampled after re-normalizing them to sum to unity. Top-$k$ and other sampling methods are motivated by the intuition that true next-token distributions are sparse, and the noisy LLM probabilities need to be truncated. However, to our knowledge, a precise theoretical motivation for the use of top-$k$ decoding is missing. In this work, we develop a theoretical framework that both explains and generalizes top-$k$ decoding. We view decoding at a fixed token as the recovery of a sparse probability distribution. We consider \emph{Bregman decoders} obtained by minimizing a separable Bregman divergence (for both the \emph{primal} and \emph{dual} cases) with a sparsity-inducing $\ell_0$ regularization. Despite the combinatorial nature of the objective, we show how to optimize it efficiently for a large class of divergences. We show that the optimal decoding strategies are greedy, and ...

Related Articles

Llms

Building knowledge bases from YouTube data using LLMs -- my workflow after 52 guides

I've been building a system that turns YouTube channels into structured knowledge bases. Thought I'd share the workflow since Karpathy's ...

Reddit - Artificial Intelligence · 1 min ·
What is AI, how do apps like ChatGPT work and why are there concerns?
Llms

What is AI, how do apps like ChatGPT work and why are there concerns?

AI is transforming modern life, but some critics worry about its potential misuse and environmental impact.

AI News - General · 7 min ·
[2603.29957] Think Anywhere in Code Generation
Llms

[2603.29957] Think Anywhere in Code Generation

Abstract page for arXiv paper 2603.29957: Think Anywhere in Code Generation

arXiv - Machine Learning · 3 min ·
[2603.16880] NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectro-Spatial Grounding and Temporal State-Space Reasoning
Llms

[2603.16880] NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectro-Spatial Grounding and Temporal State-Space Reasoning

Abstract page for arXiv paper 2603.16880: NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectr...

arXiv - Machine Learning · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime