[2604.06159] Target Policy Optimization

[2604.06159] Target Policy Optimization

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2604.06159: Target Policy Optimization

Computer Science > Machine Learning arXiv:2604.06159 (cs) [Submitted on 7 Apr 2026] Title:Target Policy Optimization Authors:Jean Kaddour View a PDF of the paper titled Target Policy Optimization, by Jean Kaddour View PDF HTML (experimental) Abstract:In RL, given a prompt, we sample a group of completions from a model and score them. Two questions follow: which completions should gain probability mass, and how should the parameters move to realize that change? Standard policy-gradient methods answer both at once, so the update can overshoot or undershoot depending on the learning rate, clipping, and other optimizer choices. We introduce \emph{Target Policy Optimization} (TPO), which separates the two questions. Given scored completions, TPO constructs a target distribution $q_i \propto p_i^{\,\mathrm{old}} \exp(u_i)$ and fits the policy to it by cross-entropy. The loss gradient on sampled-completion logits is $p^\theta - q$, which vanishes once the policy matches the target. On tabular bandits, transformer sequence tasks, and billion-parameter LLM RLVR, TPO matches PG, PPO, GRPO, and DG on easy tasks and substantially outperforms them under sparse reward. Code is available at this https URL. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2604.06159 [cs.LG]   (or arXiv:2604.06159v1 [cs.LG] for this version)   https://doi.org/10.48550/arXiv.2604.06159 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Jean Kaddour [view email...

Originally published on April 08, 2026. Curated by AI News.

Related Articles

Microsoft wants lawyers to trust its new AI agent in Word documents | The Verge
Machine Learning

Microsoft wants lawyers to trust its new AI agent in Word documents | The Verge

Microsoft’s Legal Agent comes from the work of former Robin AI engineers.

The Verge - AI · 3 min ·
Machine Learning

Newbie AI question

TBH I don't know if our current "AI" models are capable of thinking. There is a massive pattern i'm noticing when using AI and have been ...

Reddit - Artificial Intelligence · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Accelerating science with AI and simulations
Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime