[2502.13022] Efficient and Sharp Off-Policy Learning under Unobserved Confounding
Summary
This paper presents a novel method for off-policy learning that addresses unobserved confounding, enhancing the accuracy of policy learning in critical applications like healthcare.
Why It Matters
Unobserved confounding can lead to biased estimates in policy learning, which is particularly detrimental in fields like healthcare and public policy. This research introduces a semi-parametrically efficient estimator that improves decision-making under such conditions, making it highly relevant for practitioners and researchers in machine learning and causal inference.
Key Takeaways
- Introduces a new estimator for off-policy learning that mitigates the effects of unobserved confounding.
- Proves that the proposed method leads to optimal confounding-robust policies.
- Demonstrates superior performance compared to existing methods through experiments with real-world data.
Computer Science > Machine Learning arXiv:2502.13022 (cs) [Submitted on 18 Feb 2025 (v1), last revised 17 Feb 2026 (this version, v3)] Title:Efficient and Sharp Off-Policy Learning under Unobserved Confounding Authors:Konstantin Hess, Dennis Frauen, Valentyn Melnychuk, Stefan Feuerriegel View a PDF of the paper titled Efficient and Sharp Off-Policy Learning under Unobserved Confounding, by Konstantin Hess and Dennis Frauen and Valentyn Melnychuk and Stefan Feuerriegel View PDF HTML (experimental) Abstract:We develop a novel method for personalized off-policy learning in scenarios with unobserved confounding. Thereby, we address a key limitation of standard policy learning: standard policy learning assumes unconfoundedness, meaning that no unobserved factors influence both treatment assignment and outcomes. However, this assumption is often violated, because of which standard policy learning produces biased estimates and thus leads to policies that can be harmful. To address this limitation, we employ causal sensitivity analysis and derive a semi-parametrically efficient estimator for a sharp bound on the value function under unobserved confounding. Our estimator has three advantages: (1) Unlike existing works, our estimator avoids unstable minimax optimization based on inverse propensity weighted outcomes. (2) Our estimator is semi-parametrically efficient. (3) We prove that our estimator leads to the optimal confounding-robust policy. Finally, we extend our theory to the ...