[2512.23805] Fitted Q Evaluation Without Bellman Completeness via

[2512.23805] Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting

arXiv - Machine Learning April 22, 2026 3 min read

About this article

Abstract page for arXiv paper 2512.23805: Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting

Statistics > Machine Learning arXiv:2512.23805 (stat) [Submitted on 29 Dec 2025 (v1), last revised 21 Apr 2026 (this version, v2)] Title:Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting Authors:Lars van der Laan, Nathan Kallus View a PDF of the paper titled Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting, by Lars van der Laan and Nathan Kallus View PDF HTML (experimental) Abstract:Fitted Q-evaluation (FQE) is a foundational method for off-policy evaluation in reinforcement learning, but existing theory typically relies on Bellman completeness of the function class, a condition often violated in practice. This reliance is due to a fundamental norm mismatch: the Bellman operator is gamma-contractive in the L^2 norm induced by the target policy's stationary distribution, whereas standard FQE fits Bellman regressions under the behavior distribution. To resolve this mismatch, we reweight each Bellman regression step by an estimate of the stationary density ratio, inspired by emphatic weighting in temporal-difference learning. This makes the update behave as if it were performed under the target stationary distribution, restoring contraction without Bellman completeness while preserving the simplicity of regression-based evaluation. Illustrative experiments, including Baird's classical counterexample, show that stationary weighting can stabilize FQE under off-policy sampling. Subjects: Machine Learning (stat.ML); Machine Learnin...

Originally published on April 22, 2026. Curated by AI News.

Llms

I tried Gemini, ChatGPT, and Claude for a month on Android, and I have a clear winner for you

The ultimate Android AI showdown

AI Tools & Products · 5 min · about 4 hours ago

Llms

[2603.29078] PolarQuant: Optimal Gaussian Weight Quantization via Hadamard Rotation for LLM Compression

Abstract page for arXiv paper 2603.29078: PolarQuant: Optimal Gaussian Weight Quantization via Hadamard Rotation for LLM Compression

arXiv - Machine Learning · 3 min · about 6 hours ago

Llms

[2602.20409] CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation

Abstract page for arXiv paper 2602.20409: CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation

arXiv - Machine Learning · 4 min · about 6 hours ago

Llms

[2602.11199] When and What to Ask: AskBench and Rubric-Guided RLVR for LLM Clarification

Abstract page for arXiv paper 2602.11199: When and What to Ask: AskBench and Rubric-Guided RLVR for LLM Clarification

arXiv - Machine Learning · 3 min · about 6 hours ago

[2512.23805] Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting

About this article

Related Articles

I tried Gemini, ChatGPT, and Claude for a month on Android, and I have a clear winner for you

[2603.29078] PolarQuant: Optimal Gaussian Weight Quantization via Hadamard Rotation for LLM Compression

[2602.20409] CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation

[2602.11199] When and What to Ask: AskBench and Rubric-Guided RLVR for LLM Clarification

No comments

Stay updated with AI News