Llms Machine Learning Ai Infrastructure Ai Agents Ai Safety

[2602.13262] General learned delegation by clones

arXiv - AI February 17, 2026 3 min read Article

Summary

The paper presents SELFCEST, a novel approach that enhances language models by enabling them to create clones for improved reasoning efficiency, achieving better accuracy-cost trade-offs in complex tasks.

Why It Matters

As language models become increasingly integral to AI applications, optimizing their performance under fixed computational budgets is crucial. SELFCEST addresses inefficiencies in reasoning processes, potentially leading to advancements in AI capabilities across various domains, including math reasoning and multi-hop question answering.

Key Takeaways

SELFCEST allows language models to spawn clones for parallel reasoning.
The approach improves accuracy-cost efficiency over traditional models.
Demonstrates out-of-distribution generalization in challenging tasks.
Utilizes agentic reinforcement learning for end-to-end training.
Enhances performance in math reasoning and long-context QA benchmarks.

Computer Science > Artificial Intelligence arXiv:2602.13262 (cs) [Submitted on 3 Feb 2026] Title:General learned delegation by clones Authors:Darren Li, Meiqi Chen, Chenze Shao, Fandong Meng, Jie Zhou View a PDF of the paper titled General learned delegation by clones, by Darren Li and 4 other authors View PDF HTML (experimental) Abstract:Frontier language models improve with additional test-time computation, but serial reasoning or uncoordinated parallel sampling can be compute-inefficient under fixed inference budgets. We propose SELFCEST, which equips a base model with the ability to spawn same-weight clones in separate parallel contexts by agentic reinforcement learning. Training is end-to-end under a global task reward with shared-parameter rollouts, yielding a learned controller that allocates both generation and context budget across branches. Across challenging math reasoning benchmarks and long-context multi-hop QA, SELFCEST improves the accuracy-cost Pareto frontier relative to monolithic baselines at matched inference budget, and exhibits out-of-distribution generalization in both domains. Comments: Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL) Cite as: arXiv:2602.13262 [cs.AI] (or arXiv:2602.13262v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.13262 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Darren Li [view email] [v1] Tue, 3 Feb 2026 15:53:35 UTC (323 KB) Full-text links: Access...

Read Original Article

[2602.13262] General learned delegation by clones

Summary

Why It Matters

Key Takeaways

Related Articles

Tubi is the first streamer to launch a native app within ChatGPT | TechCrunch

Anyone out there use Claude Pro/Max at the same time on different screens?

[R] The Lyra Technique — A framework for interpreting internal cognitive states in LLMs (Zenodo, open access)

Looking to build a production-level AI/ML project (agentic systems), need guidance on what to build

No comments

Stay updated with AI News