[2602.13262] General learned delegation by clones

[2602.13262] General learned delegation by clones

arXiv - AI 3 min read Article

Summary

The paper presents SELFCEST, a novel approach that enhances language models by enabling them to create clones for improved reasoning efficiency, achieving better accuracy-cost trade-offs in complex tasks.

Why It Matters

As language models become increasingly integral to AI applications, optimizing their performance under fixed computational budgets is crucial. SELFCEST addresses inefficiencies in reasoning processes, potentially leading to advancements in AI capabilities across various domains, including math reasoning and multi-hop question answering.

Key Takeaways

  • SELFCEST allows language models to spawn clones for parallel reasoning.
  • The approach improves accuracy-cost efficiency over traditional models.
  • Demonstrates out-of-distribution generalization in challenging tasks.
  • Utilizes agentic reinforcement learning for end-to-end training.
  • Enhances performance in math reasoning and long-context QA benchmarks.

Computer Science > Artificial Intelligence arXiv:2602.13262 (cs) [Submitted on 3 Feb 2026] Title:General learned delegation by clones Authors:Darren Li, Meiqi Chen, Chenze Shao, Fandong Meng, Jie Zhou View a PDF of the paper titled General learned delegation by clones, by Darren Li and 4 other authors View PDF HTML (experimental) Abstract:Frontier language models improve with additional test-time computation, but serial reasoning or uncoordinated parallel sampling can be compute-inefficient under fixed inference budgets. We propose SELFCEST, which equips a base model with the ability to spawn same-weight clones in separate parallel contexts by agentic reinforcement learning. Training is end-to-end under a global task reward with shared-parameter rollouts, yielding a learned controller that allocates both generation and context budget across branches. Across challenging math reasoning benchmarks and long-context multi-hop QA, SELFCEST improves the accuracy-cost Pareto frontier relative to monolithic baselines at matched inference budget, and exhibits out-of-distribution generalization in both domains. Comments: Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL) Cite as: arXiv:2602.13262 [cs.AI]   (or arXiv:2602.13262v1 [cs.AI] for this version)   https://doi.org/10.48550/arXiv.2602.13262 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Darren Li [view email] [v1] Tue, 3 Feb 2026 15:53:35 UTC (323 KB) Full-text links: Access...

Related Articles

Tubi is the first streamer to launch a native app within ChatGPT | TechCrunch
Llms

Tubi is the first streamer to launch a native app within ChatGPT | TechCrunch

Tubi becomes the first streaming service to offer an app integration within ChatGPT, the AI chatbot that millions of users turn to for an...

TechCrunch - AI · 3 min ·
Llms

Anyone out there use Claude Pro/Max at the same time on different screens?

I am asking for feedback ? I’m currently using a Claude paid plan (Pro/Max) and was wondering about the logistics of simultaneous use. Sp...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] The Lyra Technique — A framework for interpreting internal cognitive states in LLMs (Zenodo, open access)

We're releasing a paper on a new framework for reading and interpreting the internal cognitive states of large language models: "The Lyra...

Reddit - Machine Learning · 1 min ·
Llms

Looking to build a production-level AI/ML project (agentic systems), need guidance on what to build

Hi everyone, I’m a final-year undergraduate AI/ML student currently focusing on applied AI / agentic systems. So far, I’ve spent time und...

Reddit - ML Jobs · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime