[2603.01641] Learning Structured Reasoning via Tractable Trajectory Control

[2603.01641] Learning Structured Reasoning via Tractable Trajectory Control

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.01641: Learning Structured Reasoning via Tractable Trajectory Control

Computer Science > Artificial Intelligence arXiv:2603.01641 (cs) [Submitted on 2 Mar 2026] Title:Learning Structured Reasoning via Tractable Trajectory Control Authors:Po-Nien Kung, Zhen Yang, Jeffrey Luo, Cheng-Fu Yang, Haikang Deng, Zi-Yi Dou, Yinfei Yang, Nanyun Peng, Zhe Gan, Kai-Wei Chang View a PDF of the paper titled Learning Structured Reasoning via Tractable Trajectory Control, by Po-Nien Kung and 9 other authors View PDF HTML (experimental) Abstract:Large language models can exhibit emergent reasoning behaviors, often manifested as recurring lexical patterns (e.g., "wait," indicating verification). However, complex reasoning trajectories remain sparse in unconstrained sampling, and standard RL often fails to guarantee the acquisition of diverse reasoning behaviors. We propose a systematic discovery and reinforcement of diverse reasoning patterns through structured reasoning, a paradigm that requires targeted exploration of specific reasoning patterns during the RL process. To this end, we propose Ctrl-R, a framework for learning structured reasoning via tractable trajectory control that actively guides the rollout process, incentivizing the exploration of diverse reasoning patterns that are critical for complex problem-solving. The resulting behavior policy enables accurate importance-sampling estimation, supporting unbiased on-policy optimization. We further introduce a power-scaling factor on the importance-sampling weights, allowing the policy to selectively l...

Originally published on March 03, 2026. Curated by AI News.

Related Articles

[2602.06869] Uncovering Cross-Objective Interference in Multi-Objective Alignment
Llms

[2602.06869] Uncovering Cross-Objective Interference in Multi-Objective Alignment

Abstract page for arXiv paper 2602.06869: Uncovering Cross-Objective Interference in Multi-Objective Alignment

arXiv - Machine Learning · 3 min ·
[2512.14954] Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation
Llms

[2512.14954] Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation

Abstract page for arXiv paper 2512.14954: Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation

arXiv - Machine Learning · 4 min ·
[2603.08022] Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization
Llms

[2603.08022] Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization

Abstract page for arXiv paper 2603.08022: Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization

arXiv - Machine Learning · 4 min ·
[2505.00753] LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey
Llms

[2505.00753] LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey

Abstract page for arXiv paper 2505.00753: LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey

arXiv - Machine Learning · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime