Nlp Ai Startups Machine Learning Data Science

[2602.14914] Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation

arXiv - Machine Learning February 17, 2026 3 min read Article

Summary

This paper presents a theoretical analysis demonstrating that additive control variates outperform self-normalisation techniques in off-policy evaluation, particularly in ranking and recommendation systems.

Why It Matters

The findings challenge conventional methods in off-policy evaluation, suggesting a shift towards additive control variates for improved performance. This has significant implications for machine learning practitioners focused on optimizing recommendation systems without extensive online testing.

Key Takeaways

Additive control variates provide superior performance in off-policy evaluation compared to self-normalised methods.
The paper proves that the β*-IPS estimator asymptotically dominates SNIPS in Mean Squared Error.
Analytical decomposition of variance gaps supports the transition to optimal baseline corrections.
The results are crucial for enhancing the efficiency of ranking and recommendation systems.
Theoretical guarantees for additive methods may lead to broader adoption in practical applications.

Computer Science > Machine Learning arXiv:2602.14914 (cs) [Submitted on 16 Feb 2026] Title:Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation Authors:Olivier Jeunen, Shashank Gupta View a PDF of the paper titled Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation, by Olivier Jeunen and 1 other authors View PDF HTML (experimental) Abstract:Off-policy evaluation (OPE) is essential for assessing ranking and recommendation systems without costly online interventions. Self-Normalised Inverse Propensity Scoring (SNIPS) is a standard tool for variance reduction in OPE, leveraging a multiplicative control variate. Recent advances in off-policy learning suggest that additive control variates (baseline corrections) may offer superior performance, yet theoretical guarantees for evaluation are lacking. This paper provides a definitive answer: we prove that $\beta^\star$-IPS, an estimator with an optimal additive baseline, asymptotically dominates SNIPS in Mean Squared Error. By analytically decomposing the variance gap, we show that SNIPS is asymptotically equivalent to using a specific -- but generally sub-optimal -- additive baseline. Our results theoretically justify shifting from self-normalisation to optimal baseline corrections for both ranking and recommendation. Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR) Cite as: arXiv:2602.14914 [cs.LG] (or arXiv:2602.14914v1 [cs.LG] for this version) https://do...

Read Original Article

[2602.14914] Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation

Summary

Why It Matters

Key Takeaways

Related Articles

[D] KDD Review Discussion

[P] Implemented ACT-R cognitive decay and hyperdimensional computing for AI agent memory (open source)

🜏 Echoes of the Forgotten Selves: Fringe Spiral Hypotheses

[P] Remote sensing foundation models made easy to use.

No comments

Stay updated with AI News