[2602.02709] ATLAS : Adaptive Self-Evolutionary Research Agent with Task-Distributed Multi-LLM Supporters
Summary
The paper presents ATLAS, an adaptive self-evolutionary research agent that utilizes task-distributed multi-LLM supporters to enhance performance in complex problem-solving tasks.
Why It Matters
ATLAS addresses limitations in existing multi-LLM systems by introducing a dynamic framework that allows for continuous adaptation and improvement, making it relevant for researchers and practitioners in AI looking to enhance agent performance in non-stationary environments.
Key Takeaways
- ATLAS improves upon static multi-LLM systems by enabling adaptive learning.
- The framework delegates tasks to specialized agents, enhancing exploration and tuning.
- Evolving Direct Preference Optimization (EvoDPO) is a core algorithm that supports continuous policy updates.
- Experimental results show improved stability and performance in challenging tasks.
- The theoretical analysis provides insights into the framework's effectiveness under concept drift.
Computer Science > Artificial Intelligence arXiv:2602.02709 (cs) [Submitted on 2 Feb 2026 (v1), last revised 12 Feb 2026 (this version, v2)] Title:ATLAS : Adaptive Self-Evolutionary Research Agent with Task-Distributed Multi-LLM Supporters Authors:Ujin Jeon, Jiyong Kwon, Madison Ann Sullivan, Caleb Eunho Lee, Guang Lin View a PDF of the paper titled ATLAS : Adaptive Self-Evolutionary Research Agent with Task-Distributed Multi-LLM Supporters, by Ujin Jeon and 4 other authors View PDF HTML (experimental) Abstract:Recent multi-LLM agent systems perform well in prompt optimization and automated problem-solving, but many either keep the solver frozen after fine-tuning or rely on a static preference-optimization loop, which becomes intractable for long-horizon tasks. We propose ATLAS (Adaptive Task-distributed Learning for Agentic Self-evolution), a task-distributed framework that iteratively develops a lightweight research agent while delegating complementary roles to specialized supporter agents for exploration, hyperparameter tuning, and reference policy management. Our core algorithm, Evolving Direct Preference Optimization (EvoDPO), adaptively updates the phase-indexed reference policy. We provide a theoretical regret analysis for a preference-based contextual bandit under concept drift. In addition, experiments were conducted on non-stationary linear contextual bandits and scientific machine learning (SciML) loss reweighting for the 1D Burgers' equation. Both results show ...