[2601.20071] Distributional value gradients for stochastic

[2601.20071] Distributional value gradients for stochastic environments

arXiv - Machine Learning March 04, 2026 4 min read

About this article

Abstract page for arXiv paper 2601.20071: Distributional value gradients for stochastic environments

Computer Science > Machine Learning arXiv:2601.20071 (cs) [Submitted on 27 Jan 2026 (v1), last revised 2 Mar 2026 (this version, v3)] Title:Distributional value gradients for stochastic environments Authors:Baptiste Debes, Tinne Tuytelaars View a PDF of the paper titled Distributional value gradients for stochastic environments, by Baptiste Debes and 1 other authors View PDF HTML (experimental) Abstract:Gradient-regularized value learning methods improve sample efficiency by leveraging learned models of transition dynamics and rewards to estimate return gradients. However, existing approaches, such as MAGE, struggle in stochastic or noisy environments, limiting their applicability. In this work, we address these limitations by extending distributional reinforcement learning on continuous state-action spaces to model not only the distribution over scalar state-action value functions but also over their gradients. We refer to this approach as Distributional Sobolev Training. Inspired by Stochastic Value Gradients (SVG), our method utilizes a one-step world model of reward and transition distributions implemented via a conditional Variational Autoencoder (cVAE). The proposed framework is sample-based and employs Max-sliced Maximum Mean Discrepancy (MSMMD) to instantiate the distributional Bellman operator. We prove that the Sobolev-augmented Bellman operator is a contraction with a unique fixed point, and highlight a fundamental smoothness trade-off underlying contraction in ...

Originally published on March 04, 2026. Curated by AI News.

Machine Learning

I tried building a memory-first AI… and ended up discovering smaller models can beat larger ones

Dataset Model Acc F1 Δ vs Log Δ vs Static Avg Params Peak Params Steps Infer ms Size Banking77-20 Logistic TF-IDF 92.37% 0.9230 +0.00pp +...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

[D] Howcome Muon is only being used for Transformers?

Muon has quickly been adopted in LLM training, yet we don't see it being talked about in other contexts. Searches for Muon on ConvNets tu...

Reddit - Machine Learning · 1 min · about 1 hour ago

Machine Learning

[P] Run Karpathy's Autoresearch for $0.44 instead of $24 — Open-source parallel evolution pipeline on SageMaker Spot

TL;DR: I built an open-source pipeline that runs Karpathy's autoresearch on SageMaker Spot instances — 25 autonomous ML experiments for $...

Reddit - Machine Learning · 1 min · about 1 hour ago

Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min · about 1 hour ago

[2601.20071] Distributional value gradients for stochastic environments

About this article

Related Articles

I tried building a memory-first AI… and ended up discovering smaller models can beat larger ones

[D] Howcome Muon is only being used for Transformers?

[P] Run Karpathy's Autoresearch for $0.44 instead of $24 — Open-source parallel evolution pipeline on SageMaker Spot

Improving AI models’ ability to explain their predictions

No comments

Stay updated with AI News