[2603.01481] Harmonizing Dense and Sparse Signals in Multi-turn RL:

[2603.01481] Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents

arXiv - AI March 03, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.01481: Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents

Computer Science > Artificial Intelligence arXiv:2603.01481 (cs) [Submitted on 2 Mar 2026] Title:Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents Authors:Haojin Yang, Ai Jian, Xinyue Huang, Yiwei Wang, Weipeng Zhang, Ke Zeng, Xunliang Cai, Jingqing Ruan View a PDF of the paper titled Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents, by Haojin Yang and 7 other authors View PDF HTML (experimental) Abstract:Optimizing large language models for industrial sales requires balancing long-term commercial objectives (e.g., conversion rate) with immediate linguistic constraints such as fluency and compliance. Conventional reinforcement learning often merges these heterogeneous goals into a single reward, causing high-magnitude session-level rewards to overwhelm subtler turn-level signals, which leads to unstable training or reward hacking. To address this issue, we propose Dual-Horizon Credit Assignment (DuCA), a framework that disentangles optimization across time scales. Its core, Horizon-Independent Advantage Normalization (HIAN), separately normalizes advantages from turn-level and session-level rewards before fusion, ensuring balanced gradient contributions from both immediate and long-term objectives to the policy update. Extensive experiments with a high-fidelity user simulator show DuCA outperforms the state-of-the-art GRPO baseline, achieving a 6...

Originally published on March 03, 2026. Curated by AI News.

Llms

[2602.07238] Is there "Secret Sauce'' in Large Language Model Development?

Abstract page for arXiv paper 2602.07238: Is there "Secret Sauce'' in Large Language Model Development?

arXiv - Machine Learning · 3 min · about 4 hours ago

Llms

[2602.01203] Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse

Abstract page for arXiv paper 2602.01203: Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse

arXiv - Machine Learning · 4 min · about 4 hours ago

Llms

[2601.01322] LinMU: Multimodal Understanding Made Linear

Abstract page for arXiv paper 2601.01322: LinMU: Multimodal Understanding Made Linear

arXiv - Machine Learning · 4 min · about 4 hours ago

Llms

[2512.05525] Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

Abstract page for arXiv paper 2512.05525: Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

arXiv - Machine Learning · 4 min · about 4 hours ago

[2603.01481] Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents

About this article

Related Articles

[2602.07238] Is there "Secret Sauce'' in Large Language Model Development?

[2602.01203] Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse

[2601.01322] LinMU: Multimodal Understanding Made Linear

[2512.05525] Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

No comments

Stay updated with AI News