[2603.02216] ATPO: Adaptive Tree Policy Optimization for Multi-Turn

[2603.02216] ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

arXiv - AI March 04, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.02216: ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

Computer Science > Machine Learning arXiv:2603.02216 (cs) [Submitted on 10 Feb 2026] Title:ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue Authors:Ruike Cao, Shaojie Bai, Fugen Yao, Liang Dong, Jian Xu, Li Xiao View a PDF of the paper titled ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue, by Ruike Cao and 5 other authors View PDF HTML (experimental) Abstract:Effective information seeking in multi-turn medical dialogues is critical for accurate diagnosis, especially when dealing with incomplete information. Aligning Large Language Models (LLMs) for these interactive scenarios is challenging due to the uncertainty inherent in user-agent interactions, which we formulate as a Hierarchical Markov Decision Process (H-MDP). While conventional Reinforcement Learning (RL) methods like Group Relative Policy Optimization (GRPO) struggle with long-horizon credit assignment and Proximal Policy Optimization (PPO) suffers from unstable value estimation in this context, we propose a novel uncertainty-aware Adaptive Tree Policy Optimization (ATPO) algorithm. Our method adaptively allocates the rollout budget to states with high uncertainty, quantified by a composite metric of Bellman error and action-value variance. This strategy enables more accurate value estimation, while fostering more efficient and diverse exploration. To mitigate the high computational cost of tree-based RL, we introduce two key optimizations: an uncertainty-guided pru...

Originally published on March 04, 2026. Curated by AI News.

Llms

What if Claude isn’t getting dumber?

I keep seeing posts about how Anthropic has dumbed down Claude some 67%… (my son would shout SIX SEVEN at this). What if it wasn’t Anthro...

Reddit - Artificial Intelligence · 1 min · 43 minutes ago

Llms

8 free AI courses from Anthropic’s Claude platform with certificates

AI News - General · about 1 hour ago

Llms

Gemini gets major upgrade towards interactive AI learning

AI News - General · 3 min · about 3 hours ago

Llms

Anthropic launches Claude Managed Agents — composable APIs for shipping production AI agents 10x faster. Notion, Rakuten, Asana, and Sentry already in production.

Anthropic launches Claude Managed Agents in public beta — composable APIs for shipping production AI agents 10x faster Handles sandboxing...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

[2603.02216] ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

About this article

Related Articles

What if Claude isn’t getting dumber?

8 free AI courses from Anthropic’s Claude platform with certificates

Gemini gets major upgrade towards interactive AI learning

Anthropic launches Claude Managed Agents — composable APIs for shipping production AI agents 10x faster. Notion, Rakuten, Asana, and Sentry already in production.

No comments

Stay updated with AI News