[2505.17288] Learning to Choose or Choosing to Learn: Best-of-N vs.

[2505.17288] Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation

arXiv - Machine Learning March 31, 2026 3 min read

About this article

Abstract page for arXiv paper 2505.17288: Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation

Statistics > Machine Learning arXiv:2505.17288 (stat) [Submitted on 22 May 2025 (v1), last revised 28 Mar 2026 (this version, v2)] Title:Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation Authors:Seamus Somerstep, Vinod Raman, Unique Subedi, Yuekai Sun View a PDF of the paper titled Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation, by Seamus Somerstep and Vinod Raman and Unique Subedi and Yuekai Sun View PDF Abstract:Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new next token predictor on good generations. The second method, Best-of-N, trains a reward model to select good responses from a collection generated by an unaltered base model. If the learning setting is realizable, we find that supervised fine-tuning outperforms BoN through a better dependence on the response length in its rate of convergence. If realizability fails, then depending on the failure mode, BoN can enjoy a better rate of convergence in either n or a rate of convergence with better dependence on the response length. Comments: Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG) Cite as: arXiv:2505.17288 [stat.ML] (or arXiv:2505.17288v2 [stat.ML] for this version) https://doi.org/10.48550/arXiv.2505.17288 Focu...

Originally published on March 31, 2026. Curated by AI News.

Llms

Iran threatens ‘complete and utter annihilation’ of OpenAI's $30B Stargate AI data center in Abu Dhabi — regime posts video with satellite imagery of ChatGPT-maker's premier 1GW data center

Iran's Islamic Revolutionary Guard Corps (IRGC) issued this specific threat in a video update.

AI Tools & Products · 5 min · about 1 hour ago

Llms

AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface

AI Tools & Products · 3 min · about 1 hour ago

Llms

Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto

Anthropic cut Claude subscription access for Openclaw on April 4, pushing crypto AI agent users to pay-as-you-go billing.

AI Tools & Products · 7 min · about 1 hour ago

Llms

I hit Claude’s new usage limits — and It changed how I use AI forever

Claude's message limits are dynamic, meaning they change based on site demand which is why I recommend using "Mega-Prompts" and utilizing...

AI Tools & Products · 8 min · about 1 hour ago

[2505.17288] Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation

About this article

Related Articles

Iran threatens ‘complete and utter annihilation’ of OpenAI's $30B Stargate AI data center in Abu Dhabi — regime posts video with satellite imagery of ChatGPT-maker's premier 1GW data center

AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface

Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto

I hit Claude’s new usage limits — and It changed how I use AI forever

No comments

Stay updated with AI News