[2006.04363] Mitigating Value Hallucination in Dyna Planning via

[2006.04363] Mitigating Value Hallucination in Dyna Planning via Multistep Predecessor Models

arXiv - AI April 07, 2026 4 min read

About this article

Abstract page for arXiv paper 2006.04363: Mitigating Value Hallucination in Dyna Planning via Multistep Predecessor Models

Computer Science > Machine Learning arXiv:2006.04363 (cs) [Submitted on 8 Jun 2020 (v1), last revised 3 Apr 2026 (this version, v2)] Title:Mitigating Value Hallucination in Dyna Planning via Multistep Predecessor Models Authors:Farzane Aminmansour, Taher Jafferjee, Ehsan Imani, Erin Talvitie, Micheal Bowling, Martha White View a PDF of the paper titled Mitigating Value Hallucination in Dyna Planning via Multistep Predecessor Models, by Farzane Aminmansour and 5 other authors View PDF HTML (experimental) Abstract:Dyna-style reinforcement learning (RL) agents improve sample efficiency over model-free RL agents by updating the value function with simulated experience generated by an environment model. However, it is often difficult to learn accurate models of environment dynamics, and even small errors may result in failure of Dyna agents. In this paper, we highlight that one potential cause of that failure is bootstrapping off of the values of simulated states, and introduce a new Dyna algorithm to avoid this failure. We discuss a design space of Dyna algorithms, based on using successor or predecessor models -- simulating forwards or backwards -- and using one-step or multi-step updates. Three of the variants have been explored, but surprisingly the fourth variant has not: using predecessor models with multi-step updates. We present the \emph{Hallucinated Value Hypothesis} (HVH): updating the values of real states towards values of simulated states can result in misleading ...

Originally published on April 07, 2026. Curated by AI News.

Machine Learning

How are you managing long-running preprocessing jobs at scale? Curious what's actually working [R]

Did anyone actually trial these properly for Machine Learning Jobs before walking away, or was it more of a ‘looked at the docs and noped...

Reddit - Machine Learning · 1 min · about 1 hour ago

Ai Startups

Top 10 AI certifications and courses for 2026

This article reviews the top 10 AI certifications and courses for 2026, highlighting their significance in a rapidly evolving field and t...

AI Events · 15 min · about 1 hour ago

Llms

If AI is about to get 10x smarter, how do we prevent the internet from collapsing under synthetic noise?

Im all for acceleration. I think the faster we hit AGI the better. but theres a bottleneck nobody here talks about enough-training data. ...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Qwen3 4B outperforms cloud agents on code tasks—with Mahoraga research [R]

Hey everyone in ML. I've been working on Mahoraga, an open-source orchestrator that routes tasks across local and cloud AI agents using a...

Reddit - Machine Learning · 1 min · about 2 hours ago

[2006.04363] Mitigating Value Hallucination in Dyna Planning via Multistep Predecessor Models

About this article

Related Articles

How are you managing long-running preprocessing jobs at scale? Curious what's actually working [R]

Top 10 AI certifications and courses for 2026

If AI is about to get 10x smarter, how do we prevent the internet from collapsing under synthetic noise?

Qwen3 4B outperforms cloud agents on code tasks—with Mahoraga research [R]

No comments

Stay updated with AI News