[2505.17288] Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation

[2505.17288] Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2505.17288: Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation

Statistics > Machine Learning arXiv:2505.17288 (stat) [Submitted on 22 May 2025 (v1), last revised 28 Mar 2026 (this version, v2)] Title:Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation Authors:Seamus Somerstep, Vinod Raman, Unique Subedi, Yuekai Sun View a PDF of the paper titled Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation, by Seamus Somerstep and Vinod Raman and Unique Subedi and Yuekai Sun View PDF Abstract:Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new next token predictor on good generations. The second method, Best-of-N, trains a reward model to select good responses from a collection generated by an unaltered base model. If the learning setting is realizable, we find that supervised fine-tuning outperforms BoN through a better dependence on the response length in its rate of convergence. If realizability fails, then depending on the failure mode, BoN can enjoy a better rate of convergence in either n or a rate of convergence with better dependence on the response length. Comments: Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG) Cite as: arXiv:2505.17288 [stat.ML]   (or arXiv:2505.17288v2 [stat.ML] for this version)   https://doi.org/10.48550/arXiv.2505.17288 Focu...

Originally published on March 31, 2026. Curated by AI News.

Related Articles

Iran threatens ‘complete and utter annihilation’ of OpenAI's $30B Stargate AI data center in Abu Dhabi — regime posts video with satellite imagery of ChatGPT-maker's premier 1GW data center
Llms

Iran threatens ‘complete and utter annihilation’ of OpenAI's $30B Stargate AI data center in Abu Dhabi — regime posts video with satellite imagery of ChatGPT-maker's premier 1GW data center

Iran's Islamic Revolutionary Guard Corps (IRGC) issued this specific threat in a video update.

AI Tools & Products · 5 min ·
AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface
Llms

AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface

AI Tools & Products · 3 min ·
Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto
Llms

Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto

Anthropic cut Claude subscription access for Openclaw on April 4, pushing crypto AI agent users to pay-as-you-go billing.

AI Tools & Products · 7 min ·
I hit Claude’s new usage limits — and It changed how I use AI forever
Llms

I hit Claude’s new usage limits — and It changed how I use AI forever

Claude's message limits are dynamic, meaning they change based on site demand which is why I recommend using "Mega-Prompts" and utilizing...

AI Tools & Products · 8 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime