[2602.17547] KLong: Training LLM Agent for Extremely Long-horizon Tasks

[2602.17547] KLong: Training LLM Agent for Extremely Long-horizon Tasks

arXiv - AI 3 min read Article

Summary

The paper presents KLong, an open-source LLM agent designed for solving extremely long-horizon tasks by utilizing trajectory-splitting SFT and progressive RL training methods.

Why It Matters

KLong addresses the limitations of existing LLMs in handling long-horizon tasks, which are critical for applications in AI that require sustained reasoning over extended periods. By introducing innovative training techniques, this research could significantly enhance the capabilities of AI agents in complex problem-solving scenarios.

Key Takeaways

  • KLong utilizes trajectory-splitting SFT to enhance model training for long-horizon tasks.
  • The Research-Factory pipeline generates high-quality training data from research papers.
  • Progressive RL training improves the model's ability to handle extended tasks effectively.
  • KLong outperforms existing models like Kimi K2 Thinking on various benchmarks.
  • The methodology could be applied to other domains requiring long-term reasoning.

Computer Science > Artificial Intelligence arXiv:2602.17547 (cs) [Submitted on 19 Feb 2026] Title:KLong: Training LLM Agent for Extremely Long-horizon Tasks Authors:Yue Liu, Zhiyuan Hu, Flood Sung, Jiaheng Zhang, Bryan Hooi View a PDF of the paper titled KLong: Training LLM Agent for Extremely Long-horizon Tasks, by Yue Liu and 4 other authors View PDF HTML (experimental) Abstract:This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via trajectory-splitting SFT, then scale it via progressive RL training. Specifically, we first activate basic agentic abilities of a base model with a comprehensive SFT recipe. Then, we introduce Research-Factory, an automated pipeline that generates high-quality training data by collecting research papers and constructing evaluation rubrics. Using this pipeline, we build thousands of long-horizon trajectories distilled from Claude 4.5 Sonnet (Thinking). To train with these extremely long trajectories, we propose a new trajectory-splitting SFT, which preserves early context, progressively truncates later context, and maintains overlap between sub-trajectories. In addition, to further improve long-horizon task-solving capability, we propose a novel progressive RL, which schedules training into multiple stages with progressively extended timeouts. Experiments demonstrate the superiority and generalization of KLong, as shown in Figure 1. Notably, our pr...

Related Articles

Llms

Agents that write their own code at runtime and vote on capabilities, no human in the loop

hollowOS just hit v4.4 and I added something that I haven’t seen anyone else do. Previous versions gave you an OS for agents: structured ...

Reddit - Artificial Intelligence · 1 min ·
Google Maps can now write captions for your photos using AI | TechCrunch
Llms

Google Maps can now write captions for your photos using AI | TechCrunch

Gemini can now create captions when users are looking to share a photo or video.

TechCrunch - AI · 4 min ·
Llms

ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

submitted by /u/PatienceHistorical70 [link] [comments]

Reddit - Machine Learning · 1 min ·
Llms

Stop Overcomplicating AI Workflows. This Is the Simple Framework

I’ve been working on building an agentic AI workflow system for business use cases and one thing became very clear very quickly. This is ...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime