[2604.01345] Malliavin Calculus for Counterfactual Gradient Estimation

[2604.01345] Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning

arXiv - Machine Learning April 03, 2026 3 min read

About this article

Abstract page for arXiv paper 2604.01345: Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning

Computer Science > Machine Learning arXiv:2604.01345 (cs) [Submitted on 1 Apr 2026] Title:Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning Authors:Vikram Krishnamurthy, Luke Snow View a PDF of the paper titled Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning, by Vikram Krishnamurthy and 1 other authors View PDF HTML (experimental) Abstract:Inverse reinforcement learning (IRL) recovers the loss function of a forward learner from its observed responses adaptive IRL aims to reconstruct the loss function of a forward learner by passively observing its gradients as it performs reinforcement learning (RL). This paper proposes a novel passive Langevin-based algorithm that achieves adaptive IRL. The key difficulty in adaptive IRL is that the required gradients in the passive algorithm are counterfactual, that is, they are conditioned on events of probability zero under the forward learner's trajectory. Therefore, naive Monte Carlo estimators are prohibitively inefficient, and kernel smoothing, though common, suffers from slow convergence. We overcome this by employing Malliavin calculus to efficiently estimate the required counterfactual gradients. We reformulate the counterfactual conditioning as a ratio of unconditioned expectations involving Malliavin quantities, thus recovering standard estimation rates. We derive the necessary Malliavin derivatives and their adjoint Skoroho...

Originally published on April 03, 2026. Curated by AI News.

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · 2 minutes ago

Llms

[2603.16105] Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

Abstract page for arXiv paper 2603.16105: Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

arXiv - AI · 4 min · about 2 hours ago

Llms

[2601.11652] WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching

Abstract page for arXiv paper 2601.11652: WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dy...

arXiv - AI · 4 min · about 2 hours ago

Llms

[2510.18914] Fairness Evaluation and Inference Level Mitigation in LLMs

Abstract page for arXiv paper 2510.18914: Fairness Evaluation and Inference Level Mitigation in LLMs

arXiv - AI · 3 min · about 2 hours ago

[2604.01345] Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning

About this article

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence

[2603.16105] Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

[2601.11652] WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching

[2510.18914] Fairness Evaluation and Inference Level Mitigation in LLMs

No comments

Stay updated with AI News