[2512.23805] Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting

[2512.23805] Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2512.23805: Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting

Statistics > Machine Learning arXiv:2512.23805 (stat) [Submitted on 29 Dec 2025 (v1), last revised 21 Apr 2026 (this version, v2)] Title:Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting Authors:Lars van der Laan, Nathan Kallus View a PDF of the paper titled Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting, by Lars van der Laan and Nathan Kallus View PDF HTML (experimental) Abstract:Fitted Q-evaluation (FQE) is a foundational method for off-policy evaluation in reinforcement learning, but existing theory typically relies on Bellman completeness of the function class, a condition often violated in practice. This reliance is due to a fundamental norm mismatch: the Bellman operator is gamma-contractive in the L^2 norm induced by the target policy's stationary distribution, whereas standard FQE fits Bellman regressions under the behavior distribution. To resolve this mismatch, we reweight each Bellman regression step by an estimate of the stationary density ratio, inspired by emphatic weighting in temporal-difference learning. This makes the update behave as if it were performed under the target stationary distribution, restoring contraction without Bellman completeness while preserving the simplicity of regression-based evaluation. Illustrative experiments, including Baird's classical counterexample, show that stationary weighting can stabilize FQE under off-policy sampling. Subjects: Machine Learning (stat.ML); Machine Learnin...

Originally published on April 22, 2026. Curated by AI News.

Related Articles

I tried Gemini, ChatGPT, and Claude for a month on Android, and I have a clear winner for you
Llms

I tried Gemini, ChatGPT, and Claude for a month on Android, and I have a clear winner for you

The ultimate Android AI showdown

AI Tools & Products · 5 min ·
[2603.29078] PolarQuant: Optimal Gaussian Weight Quantization via Hadamard Rotation for LLM Compression
Llms

[2603.29078] PolarQuant: Optimal Gaussian Weight Quantization via Hadamard Rotation for LLM Compression

Abstract page for arXiv paper 2603.29078: PolarQuant: Optimal Gaussian Weight Quantization via Hadamard Rotation for LLM Compression

arXiv - Machine Learning · 3 min ·
[2602.20409] CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation
Llms

[2602.20409] CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation

Abstract page for arXiv paper 2602.20409: CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation

arXiv - Machine Learning · 4 min ·
[2602.11199] When and What to Ask: AskBench and Rubric-Guided RLVR for LLM Clarification
Llms

[2602.11199] When and What to Ask: AskBench and Rubric-Guided RLVR for LLM Clarification

Abstract page for arXiv paper 2602.11199: When and What to Ask: AskBench and Rubric-Guided RLVR for LLM Clarification

arXiv - Machine Learning · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime