[2602.13262] General learned delegation by clones
Summary
The paper presents SELFCEST, a novel approach that enhances language models by enabling them to create clones for improved reasoning efficiency, achieving better accuracy-cost trade-offs in complex tasks.
Why It Matters
As language models become increasingly integral to AI applications, optimizing their performance under fixed computational budgets is crucial. SELFCEST addresses inefficiencies in reasoning processes, potentially leading to advancements in AI capabilities across various domains, including math reasoning and multi-hop question answering.
Key Takeaways
- SELFCEST allows language models to spawn clones for parallel reasoning.
- The approach improves accuracy-cost efficiency over traditional models.
- Demonstrates out-of-distribution generalization in challenging tasks.
- Utilizes agentic reinforcement learning for end-to-end training.
- Enhances performance in math reasoning and long-context QA benchmarks.
Computer Science > Artificial Intelligence arXiv:2602.13262 (cs) [Submitted on 3 Feb 2026] Title:General learned delegation by clones Authors:Darren Li, Meiqi Chen, Chenze Shao, Fandong Meng, Jie Zhou View a PDF of the paper titled General learned delegation by clones, by Darren Li and 4 other authors View PDF HTML (experimental) Abstract:Frontier language models improve with additional test-time computation, but serial reasoning or uncoordinated parallel sampling can be compute-inefficient under fixed inference budgets. We propose SELFCEST, which equips a base model with the ability to spawn same-weight clones in separate parallel contexts by agentic reinforcement learning. Training is end-to-end under a global task reward with shared-parameter rollouts, yielding a learned controller that allocates both generation and context budget across branches. Across challenging math reasoning benchmarks and long-context multi-hop QA, SELFCEST improves the accuracy-cost Pareto frontier relative to monolithic baselines at matched inference budget, and exhibits out-of-distribution generalization in both domains. Comments: Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL) Cite as: arXiv:2602.13262 [cs.AI] (or arXiv:2602.13262v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.13262 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Darren Li [view email] [v1] Tue, 3 Feb 2026 15:53:35 UTC (323 KB) Full-text links: Access...