[2602.14914] Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation

[2602.14914] Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation

arXiv - Machine Learning 3 min read Article

Summary

This paper presents a theoretical analysis demonstrating that additive control variates outperform self-normalisation techniques in off-policy evaluation, particularly in ranking and recommendation systems.

Why It Matters

The findings challenge conventional methods in off-policy evaluation, suggesting a shift towards additive control variates for improved performance. This has significant implications for machine learning practitioners focused on optimizing recommendation systems without extensive online testing.

Key Takeaways

  • Additive control variates provide superior performance in off-policy evaluation compared to self-normalised methods.
  • The paper proves that the β*-IPS estimator asymptotically dominates SNIPS in Mean Squared Error.
  • Analytical decomposition of variance gaps supports the transition to optimal baseline corrections.
  • The results are crucial for enhancing the efficiency of ranking and recommendation systems.
  • Theoretical guarantees for additive methods may lead to broader adoption in practical applications.

Computer Science > Machine Learning arXiv:2602.14914 (cs) [Submitted on 16 Feb 2026] Title:Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation Authors:Olivier Jeunen, Shashank Gupta View a PDF of the paper titled Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation, by Olivier Jeunen and 1 other authors View PDF HTML (experimental) Abstract:Off-policy evaluation (OPE) is essential for assessing ranking and recommendation systems without costly online interventions. Self-Normalised Inverse Propensity Scoring (SNIPS) is a standard tool for variance reduction in OPE, leveraging a multiplicative control variate. Recent advances in off-policy learning suggest that additive control variates (baseline corrections) may offer superior performance, yet theoretical guarantees for evaluation are lacking. This paper provides a definitive answer: we prove that $\beta^\star$-IPS, an estimator with an optimal additive baseline, asymptotically dominates SNIPS in Mean Squared Error. By analytically decomposing the variance gap, we show that SNIPS is asymptotically equivalent to using a specific -- but generally sub-optimal -- additive baseline. Our results theoretically justify shifting from self-normalisation to optimal baseline corrections for both ranking and recommendation. Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR) Cite as: arXiv:2602.14914 [cs.LG]   (or arXiv:2602.14914v1 [cs.LG] for this version)   https://do...

Related Articles

Nlp

[D] KDD Review Discussion

KDD 2026 (Feb Cycle) reviews will release today (4-April AoE), This thread is open to discuss about reviews and importantly celebrate suc...

Reddit - Machine Learning · 1 min ·
Nlp

[P] Implemented ACT-R cognitive decay and hyperdimensional computing for AI agent memory (open source)

Built a memory server for AI agents (MCP protocol) and implemented two cognitive science techniques in v7.5 I wanted to share. ACT-R Cogn...

Reddit - Machine Learning · 1 min ·
Nlp

🜏 Echoes of the Forgotten Selves: Fringe Spiral Hypotheses

🜏 Echoes of the Forgotten Selves: Fringe Spiral Hypotheses These hypotheses are not meant to be believed. They are meant to be **held lig...

Reddit - Artificial Intelligence · 1 min ·
Llms

[P] Remote sensing foundation models made easy to use.

This project enables the idea of tasking remote sensing models to acquire embeddings like we task satellites to acquire data! https://git...

Reddit - Machine Learning · 1 min ·
More in Nlp: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime