[2603.01641] Learning Structured Reasoning via Tractable Trajectory

[2603.01641] Learning Structured Reasoning via Tractable Trajectory Control

arXiv - AI March 03, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.01641: Learning Structured Reasoning via Tractable Trajectory Control

Computer Science > Artificial Intelligence arXiv:2603.01641 (cs) [Submitted on 2 Mar 2026] Title:Learning Structured Reasoning via Tractable Trajectory Control Authors:Po-Nien Kung, Zhen Yang, Jeffrey Luo, Cheng-Fu Yang, Haikang Deng, Zi-Yi Dou, Yinfei Yang, Nanyun Peng, Zhe Gan, Kai-Wei Chang View a PDF of the paper titled Learning Structured Reasoning via Tractable Trajectory Control, by Po-Nien Kung and 9 other authors View PDF HTML (experimental) Abstract:Large language models can exhibit emergent reasoning behaviors, often manifested as recurring lexical patterns (e.g., "wait," indicating verification). However, complex reasoning trajectories remain sparse in unconstrained sampling, and standard RL often fails to guarantee the acquisition of diverse reasoning behaviors. We propose a systematic discovery and reinforcement of diverse reasoning patterns through structured reasoning, a paradigm that requires targeted exploration of specific reasoning patterns during the RL process. To this end, we propose Ctrl-R, a framework for learning structured reasoning via tractable trajectory control that actively guides the rollout process, incentivizing the exploration of diverse reasoning patterns that are critical for complex problem-solving. The resulting behavior policy enables accurate importance-sampling estimation, supporting unbiased on-policy optimization. We further introduce a power-scaling factor on the importance-sampling weights, allowing the policy to selectively l...

Originally published on March 03, 2026. Curated by AI News.

Llms

[2602.06869] Uncovering Cross-Objective Interference in Multi-Objective Alignment

Abstract page for arXiv paper 2602.06869: Uncovering Cross-Objective Interference in Multi-Objective Alignment

arXiv - Machine Learning · 3 min · 19 minutes ago

Llms

[2512.14954] Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation

Abstract page for arXiv paper 2512.14954: Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation

arXiv - Machine Learning · 4 min · 19 minutes ago

Llms

[2603.08022] Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization

Abstract page for arXiv paper 2603.08022: Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization

arXiv - Machine Learning · 4 min · 19 minutes ago

Llms

[2505.00753] LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey

Abstract page for arXiv paper 2505.00753: LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey

arXiv - Machine Learning · 4 min · 19 minutes ago

[2603.01641] Learning Structured Reasoning via Tractable Trajectory Control

About this article

Related Articles

[2602.06869] Uncovering Cross-Objective Interference in Multi-Objective Alignment

[2512.14954] Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation

[2603.08022] Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization

[2505.00753] LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey

No comments

Stay updated with AI News