[2602.16839] Training Large Reasoning Models Efficiently via Progressive Thought Encoding

[2602.16839] Training Large Reasoning Models Efficiently via Progressive Thought Encoding

arXiv - Machine Learning 4 min read Article

Summary

This paper presents Progressive Thought Encoding, a novel method for training large reasoning models (LRMs) that enhances efficiency and accuracy by reducing memory usage during reinforcement learning.

Why It Matters

As large reasoning models become increasingly integral to AI applications, optimizing their training processes is crucial. This research addresses significant challenges in memory management and efficiency, making it relevant for developers and researchers in machine learning and AI.

Key Takeaways

  • Progressive Thought Encoding reduces memory usage during RL training without sacrificing performance.
  • The method shows significant improvements in reasoning accuracy across multiple models and benchmarks.
  • It allows for effective reasoning under fixed-size caches, addressing a critical barrier in LRM training.

Computer Science > Machine Learning arXiv:2602.16839 (cs) [Submitted on 18 Feb 2026] Title:Training Large Reasoning Models Efficiently via Progressive Thought Encoding Authors:Zeliang Zhang, Xiaodong Liu, Hao Cheng, Hao Sun, Chenliang Xu, Jianfeng Gao View a PDF of the paper titled Training Large Reasoning Models Efficiently via Progressive Thought Encoding, by Zeliang Zhang and 5 other authors View PDF HTML (experimental) Abstract:Large reasoning models (LRMs) excel on complex problems but face a critical barrier to efficiency: reinforcement learning (RL) training requires long rollouts for outcome-based rewards, where autoregressive decoding dominates time and memory usage. While sliding-window cache strategies can bound memory, they disrupt long-context reasoning and degrade performance. We introduce Progressive Thought Encoding, a parameter-efficient fine-tuning method that enables LRMs to reason effectively under fixed-size caches. By progressively encoding intermediate reasoning into fixed-size vector representations, our approach eliminates the need to backpropagate through full-cache rollouts, thereby reducing memory usage, while maintaining constant memory during inference. Experiments on three models, including Qwen2.5-3B-Instruct, Qwen2.5-7B-Instruct, and DeepSeek-R1-Distill-Llama-8B, on six widely used challenging mathematical benchmarks show consistent gains: our method achieves +19.3% improvement over LoRA-based fine-tuning and +29.9% over LRMs without fine-t...

Related Articles

Llms

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

The problem If you work with Italian text and local models, you know the pain. Every open-source LLM out there treats Italian as an after...

Reddit - Machine Learning · 1 min ·
Machine Learning

[R] Architecture Determines Optimization: Deriving Weight Updates from Network Topology (seeking arXiv endorsement - cs.LG)

Abstract: We derive neural network weight updates from first principles without assuming gradient descent or a specific loss function. St...

Reddit - Machine Learning · 1 min ·
Machine Learning

[P] ML project (XGBoost + Databricks + MLflow) — how to talk about “production issues” in interviews?

Hey all, I recently built an end-to-end fraud detection project using a large banking dataset: Trained an XGBoost model Used Databricks f...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] The memory chip market lost tens of billions over a paper this community would have understood in 10 minutes

TurboQuant was teased recently and tens of billions gone from memory chip market in 48 hours but anyone in this community who read the pa...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime