[2602.17547] KLong: Training LLM Agent for Extremely Long-horizon Tasks
Summary
The paper presents KLong, an open-source LLM agent designed for solving extremely long-horizon tasks by utilizing trajectory-splitting SFT and progressive RL training methods.
Why It Matters
KLong addresses the limitations of existing LLMs in handling long-horizon tasks, which are critical for applications in AI that require sustained reasoning over extended periods. By introducing innovative training techniques, this research could significantly enhance the capabilities of AI agents in complex problem-solving scenarios.
Key Takeaways
- KLong utilizes trajectory-splitting SFT to enhance model training for long-horizon tasks.
- The Research-Factory pipeline generates high-quality training data from research papers.
- Progressive RL training improves the model's ability to handle extended tasks effectively.
- KLong outperforms existing models like Kimi K2 Thinking on various benchmarks.
- The methodology could be applied to other domains requiring long-term reasoning.
Computer Science > Artificial Intelligence arXiv:2602.17547 (cs) [Submitted on 19 Feb 2026] Title:KLong: Training LLM Agent for Extremely Long-horizon Tasks Authors:Yue Liu, Zhiyuan Hu, Flood Sung, Jiaheng Zhang, Bryan Hooi View a PDF of the paper titled KLong: Training LLM Agent for Extremely Long-horizon Tasks, by Yue Liu and 4 other authors View PDF HTML (experimental) Abstract:This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via trajectory-splitting SFT, then scale it via progressive RL training. Specifically, we first activate basic agentic abilities of a base model with a comprehensive SFT recipe. Then, we introduce Research-Factory, an automated pipeline that generates high-quality training data by collecting research papers and constructing evaluation rubrics. Using this pipeline, we build thousands of long-horizon trajectories distilled from Claude 4.5 Sonnet (Thinking). To train with these extremely long trajectories, we propose a new trajectory-splitting SFT, which preserves early context, progressively truncates later context, and maintains overlap between sub-trajectories. In addition, to further improve long-horizon task-solving capability, we propose a novel progressive RL, which schedules training into multiple stages with progressively extended timeouts. Experiments demonstrate the superiority and generalization of KLong, as shown in Figure 1. Notably, our pr...