[2505.17288] Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation
About this article
Abstract page for arXiv paper 2505.17288: Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation
Statistics > Machine Learning arXiv:2505.17288 (stat) [Submitted on 22 May 2025 (v1), last revised 28 Mar 2026 (this version, v2)] Title:Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation Authors:Seamus Somerstep, Vinod Raman, Unique Subedi, Yuekai Sun View a PDF of the paper titled Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation, by Seamus Somerstep and Vinod Raman and Unique Subedi and Yuekai Sun View PDF Abstract:Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new next token predictor on good generations. The second method, Best-of-N, trains a reward model to select good responses from a collection generated by an unaltered base model. If the learning setting is realizable, we find that supervised fine-tuning outperforms BoN through a better dependence on the response length in its rate of convergence. If realizability fails, then depending on the failure mode, BoN can enjoy a better rate of convergence in either n or a rate of convergence with better dependence on the response length. Comments: Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG) Cite as: arXiv:2505.17288 [stat.ML] (or arXiv:2505.17288v2 [stat.ML] for this version) https://doi.org/10.48550/arXiv.2505.17288 Focu...