[2602.14351] WIMLE: Uncertainty-Aware World Models with IMLE for Sample-Efficient Continuous Control
Summary
The paper presents WIMLE, a model-based reinforcement learning method that enhances sample efficiency by addressing model errors and uncertainty in predictions.
Why It Matters
WIMLE's approach to uncertainty-aware world modeling is significant for advancing reinforcement learning techniques, particularly in continuous control tasks where sample efficiency is crucial. By improving the stability and performance of model-based RL, it can lead to more effective AI systems in real-world applications.
Key Takeaways
- WIMLE improves sample efficiency by over 50% on challenging tasks.
- The method utilizes uncertainty-aware weighting to enhance model performance.
- It achieves competitive results against strong model-free and model-based baselines.
- WIMLE addresses common issues in model-based RL, such as compounding errors.
- The approach is applicable across various continuous-control tasks.
Computer Science > Machine Learning arXiv:2602.14351 (cs) [Submitted on 15 Feb 2026] Title:WIMLE: Uncertainty-Aware World Models with IMLE for Sample-Efficient Continuous Control Authors:Mehran Aghabozorgi, Alireza Moazeni, Yanshu Zhang, Ke Li View a PDF of the paper titled WIMLE: Uncertainty-Aware World Models with IMLE for Sample-Efficient Continuous Control, by Mehran Aghabozorgi and 3 other authors View PDF HTML (experimental) Abstract:Model-based reinforcement learning promises strong sample efficiency but often underperforms in practice due to compounding model error, unimodal world models that average over multi-modal dynamics, and overconfident predictions that bias learning. We introduce WIMLE, a model-based method that extends Implicit Maximum Likelihood Estimation (IMLE) to the model-based RL framework to learn stochastic, multi-modal world models without iterative sampling and to estimate predictive uncertainty via ensembles and latent sampling. During training, WIMLE weights each synthetic transition by its predicted confidence, preserving useful model rollouts while attenuating bias from uncertain predictions and enabling stable learning. Across $40$ continuous-control tasks spanning DeepMind Control, MyoSuite, and HumanoidBench, WIMLE achieves superior sample efficiency and competitive or better asymptotic performance than strong model-free and model-based baselines. Notably, on the challenging Humanoid-run task, WIMLE improves sample efficiency by over $50$...