[2605.00425] AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning
About this article
Abstract page for arXiv paper 2605.00425: AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning
Computer Science > Artificial Intelligence arXiv:2605.00425 (cs) [Submitted on 1 May 2026] Title:AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning Authors:Haotian Zhao, Yuxin Zhang, Songlin Zhou, Stephen S.-T. Yau, Wenyu Zhang, Lun Tian, Tianshu Zhu, Yifeng Huang, Yucheng Zeng, Jingnan Gu, Daxiang Dong, Jianmin Wu View a PDF of the paper titled AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning, by Haotian Zhao and 11 other authors View PDF HTML (experimental) Abstract:Reinforcement learning (RL) has significantly advanced the ability of large language model (LLM) agents to interact with environments and solve multi-turn tasks. Yet effective training remains challenging, as sparse, outcome-only rewards make it difficult to assign credit to individual steps in an agent's action trajectory. A common remedy is to introduce dense intermediate supervision, such as process reward models or auxiliary self-supervised signals, but this increases supervision and tuning complexity and often generalizes poorly across tasks and domains. This paper presents AEM, a supervision-free credit assignment method that adaptively modulates entropy dynamics during RL training to achieve a more effective exploration-exploitation trade-off. Theoretically, we elevate entropy analysis from the token level to the response level to reduce token sampling variance and show that entropy drift under natural gradients is intrinsically governed by the ...