[2502.09257] From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards
Summary
This paper explores contextual combinatorial semi-bandits, presenting an algorithm that improves sample complexity in sparse reward scenarios, particularly relevant for recommendation systems.
Why It Matters
Understanding the dynamics of contextual combinatorial semi-bandits is crucial for enhancing machine learning algorithms in applications like recommendation systems. This research provides improved sample complexity, which can lead to more efficient and effective decision-making processes in various AI applications.
Key Takeaways
- Introduces an algorithm that achieves improved sample complexity for contextual combinatorial semi-bandits.
- Focuses on the $s$-sparse regime, relevant for real-world applications like recommendation systems.
- Proves significant improvements in sample complexity bounds compared to existing methods.
- Extends the framework to generalize list multiclass classification with bandit feedback.
- Establishes new regret bounds for adversarial data generation scenarios.
Computer Science > Machine Learning arXiv:2502.09257 (cs) [Submitted on 13 Feb 2025 (v1), last revised 23 Feb 2026 (this version, v4)] Title:From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards Authors:Liad Erez, Tomer Koren View a PDF of the paper titled From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards, by Liad Erez and Tomer Koren View PDF HTML (experimental) Abstract:We study the problem of contextual combinatorial semi-bandits, where input contexts are mapped into subsets of size $m$ of a collection of $K$ possible actions. In each round, the learner observes the realized reward of the predicted actions. Motivated by prototypical applications of contextual bandits, we focus on the $s$-sparse regime where we assume that the sum of rewards is bounded by some value $s\ll K$. For example, in recommendation systems the number of products purchased by any customer is significantly smaller than the total number of available products. Our main result is for the $(\epsilon,\delta)$-PAC variant of the problem for which we design an algorithm that returns an $\epsilon$-optimal policy with high probability using a sample complexity of $\tilde{O}((poly(K/m)+sm/\epsilon^2) \log(|\Pi|/\delta))$ where $\Pi$ is the underlying (finite) class and $s$ is the sparsity parameter. This bound improves upon known bounds for combinatorial semi-bandits whene...