[2604.02007] Apriel-Reasoner: RL Post-Training for General-Purpose and Efficient Reasoning
About this article
Abstract page for arXiv paper 2604.02007: Apriel-Reasoner: RL Post-Training for General-Purpose and Efficient Reasoning
Computer Science > Machine Learning arXiv:2604.02007 (cs) [Submitted on 2 Apr 2026] Title:Apriel-Reasoner: RL Post-Training for General-Purpose and Efficient Reasoning Authors:Rafael Pardinas, Ehsan Kamalloo, David Vazquez, Alexandre Drouin View a PDF of the paper titled Apriel-Reasoner: RL Post-Training for General-Purpose and Efficient Reasoning, by Rafael Pardinas and 3 other authors View PDF Abstract:Building general-purpose reasoning models using reinforcement learning with verifiable rewards (RLVR) across diverse domains has been widely adopted by frontier open-weight models. However, their training recipes and domain mixtures are often not disclosed. Joint optimization across domains poses significant challenges: domains vary widely in rollout length, problem difficulty and sample efficiency. Further, models with long chain-of-thought traces increase inference cost and latency, making efficiency critical for practical deployment. We present Apriel-Reasoner, trained with a fully reproducible multi-domain RL post-training recipe on Apriel-Base, a 15B-parameter open-weight LLM, across five domains using public datasets: mathematics, code generation, instruction following, logical puzzles and function calling. We introduce an adaptive domain sampling mechanism that preserves target domain ratios despite heterogeneous rollout dynamics, and a difficulty-aware extension of the standard length penalty that, with no additional training overhead, encourages longer reasoning f...