[2604.02322] Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning
About this article
Abstract page for arXiv paper 2604.02322: Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning
Computer Science > Machine Learning arXiv:2604.02322 (cs) [Submitted on 2 Apr 2026] Title:Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning Authors:Bangji Yang, Hongbo Ma, Jiajun Fan, Ge Liu View a PDF of the paper titled Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning, by Bangji Yang and 3 other authors View PDF HTML (experimental) Abstract:Large Language Models employing Chain-of-Thought reasoning achieve strong performance but suffer from excessive token consumption that inflates inference costs. Existing efficiency methods such as explicit length penalties, difficulty estimators, or multi-stage curricula either degrade reasoning quality or require complex training pipelines. We introduce Batched Contextual Reinforcement, a minimalist, single-stage training paradigm that unlocks efficient reasoning through a simple structural modification: training the model to solve N problems simultaneously within a shared context window, rewarded purely by per-instance accuracy. This formulation creates an implicit token budget that yields several key findings: (1) We identify a novel task-scaling law: as the number of concurrent problems N increases during inference, per-problem token usage decreases monotonically while accuracy degrades far more gracefully than baselines, establishing N as a controllable throughput dimension. (2) BCR challenges the traditional accuracy-efficiency trade-off by demonstrating a "free lunch" pheno...