[2603.24844] Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models
About this article
Abstract page for arXiv paper 2603.24844: Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models
Computer Science > Machine Learning arXiv:2603.24844 (cs) [Submitted on 25 Mar 2026] Title:Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models Authors:Isha Puri, Mehul Damani, Idan Shenfeld, Marzyeh Ghassemi, Jacob Andreas, Yoon Kim View a PDF of the paper titled Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models, by Isha Puri and 5 other authors View PDF HTML (experimental) Abstract:Given a question, a language model (LM) implicitly encodes a distribution over possible answers. In practice, post-training procedures for LMs often collapse this distribution onto a single dominant mode. While this is generally not a problem for benchmark-style evaluations that assume one correct answer, many real-world tasks inherently involve multiple valid answers or irreducible uncertainty. Examples include medical diagnosis, ambiguous question answering, and settings with incomplete information. In these cases, we would like LMs to generate multiple plausible hypotheses, ideally with confidence estimates for each one, and without computationally intensive repeated sampling to generate non-modal answers. This paper describes a multi-answer reinforcement learning approach for training LMs to perform distributional reasoning over multiple answers during inference. We modify the RL objective to enable models to explicitly generate multiple candidate answers in a single forward pass, internalizing aspects of inference-time search into the mod...