[2507.11891] Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?
Summary
This paper explores the impact of data sharing on A/B experiments in recommendation systems, focusing on how interference affects algorithm performance evaluation under a multi-armed bandit framework.
Why It Matters
Understanding the implications of data sharing in A/B testing is crucial for practitioners in machine learning and recommendation systems. This research addresses potential biases in algorithm comparisons, providing insights that can enhance decision-making processes in real-world applications.
Key Takeaways
- Data sharing can lead to biased estimates in A/B experiments.
- The stable unit treatment value assumption (SUTVA) may not hold in large-scale systems.
- The level of exploration versus exploitation is critical in algorithm evaluation.
- A detection procedure based on ramp-up experiments can identify incorrect comparisons.
- Understanding interference is essential for accurate algorithm performance assessment.
Statistics > Machine Learning arXiv:2507.11891 (stat) [Submitted on 16 Jul 2025 (v1), last revised 23 Feb 2026 (this version, v2)] Title:Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work? Authors:Shuangning Li, Chonghuan Wang, Jingyan Wang View a PDF of the paper titled Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?, by Shuangning Li and 2 other authors View PDF HTML (experimental) Abstract:We study A/B experiments that are designed to compare the performance of two recommendation algorithms. Prior work has observed that the stable unit treatment value assumption (SUTVA) often does not hold in large-scale recommendation systems, and hence the estimate for the global treatment effect (GTE) is biased. Specifically, units under the treatment and control algorithms contribute to a shared pool of data that subsequently train both algorithms, resulting in interference between the two groups. In this paper, we investigate when such interference may affect our decision making on which algorithm is better. We formalize this insight under a multi-armed bandit framework and theoretically characterize when the sign of the difference-in-means estimator of the GTE under data sharing aligns with or contradicts the sign of the true GTE. Our analysis identifies the level of exploration versus exploitation as a key determinant of how data sharing impacts decision making, and we propose a detection procedure based on...