[2604.06159] Target Policy Optimization

arXiv - Machine Learning April 08, 2026 3 min read

About this article

Abstract page for arXiv paper 2604.06159: Target Policy Optimization

Computer Science > Machine Learning arXiv:2604.06159 (cs) [Submitted on 7 Apr 2026] Title:Target Policy Optimization Authors:Jean Kaddour View a PDF of the paper titled Target Policy Optimization, by Jean Kaddour View PDF HTML (experimental) Abstract:In RL, given a prompt, we sample a group of completions from a model and score them. Two questions follow: which completions should gain probability mass, and how should the parameters move to realize that change? Standard policy-gradient methods answer both at once, so the update can overshoot or undershoot depending on the learning rate, clipping, and other optimizer choices. We introduce \emph{Target Policy Optimization} (TPO), which separates the two questions. Given scored completions, TPO constructs a target distribution $q_i \propto p_i^{\,\mathrm{old}} \exp(u_i)$ and fits the policy to it by cross-entropy. The loss gradient on sampled-completion logits is $p^\theta - q$, which vanishes once the policy matches the target. On tabular bandits, transformer sequence tasks, and billion-parameter LLM RLVR, TPO matches PG, PPO, GRPO, and DG on easy tasks and substantially outperforms them under sparse reward. Code is available at this https URL. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2604.06159 [cs.LG] (or arXiv:2604.06159v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2604.06159 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Jean Kaddour [view email...

Originally published on April 08, 2026. Curated by AI News.

Machine Learning

Microsoft wants lawyers to trust its new AI agent in Word documents | The Verge

Microsoft’s Legal Agent comes from the work of former Robin AI engineers.

The Verge - AI · 3 min · 29 minutes ago

Machine Learning

Newbie AI question

TBH I don't know if our current "AI" models are capable of thinking. There is a massive pattern i'm noticing when using AI and have been ...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 3 hours ago

Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min · about 3 hours ago

[2604.06159] Target Policy Optimization

About this article

Related Articles

Microsoft wants lawyers to trust its new AI agent in Word documents | The Verge

Newbie AI question

UMKC Announces New Master of Science in Artificial Intelligence

Accelerating science with AI and simulations

No comments

Stay updated with AI News