Machine Learning Ai Agents Data Science

[2502.09257] From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards

arXiv - AI February 24, 2026 4 min read Article

Summary

This paper explores contextual combinatorial semi-bandits, presenting an algorithm that improves sample complexity in sparse reward scenarios, particularly relevant for recommendation systems.

Why It Matters

Understanding the dynamics of contextual combinatorial semi-bandits is crucial for enhancing machine learning algorithms in applications like recommendation systems. This research provides improved sample complexity, which can lead to more efficient and effective decision-making processes in various AI applications.

Key Takeaways

Introduces an algorithm that achieves improved sample complexity for contextual combinatorial semi-bandits.
Focuses on the $s$-sparse regime, relevant for real-world applications like recommendation systems.
Proves significant improvements in sample complexity bounds compared to existing methods.
Extends the framework to generalize list multiclass classification with bandit feedback.
Establishes new regret bounds for adversarial data generation scenarios.

Computer Science > Machine Learning arXiv:2502.09257 (cs) [Submitted on 13 Feb 2025 (v1), last revised 23 Feb 2026 (this version, v4)] Title:From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards Authors:Liad Erez, Tomer Koren View a PDF of the paper titled From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards, by Liad Erez and Tomer Koren View PDF HTML (experimental) Abstract:We study the problem of contextual combinatorial semi-bandits, where input contexts are mapped into subsets of size $m$ of a collection of $K$ possible actions. In each round, the learner observes the realized reward of the predicted actions. Motivated by prototypical applications of contextual bandits, we focus on the $s$-sparse regime where we assume that the sum of rewards is bounded by some value $s\ll K$. For example, in recommendation systems the number of products purchased by any customer is significantly smaller than the total number of available products. Our main result is for the $(\epsilon,\delta)$-PAC variant of the problem for which we design an algorithm that returns an $\epsilon$-optimal policy with high probability using a sample complexity of $\tilde{O}((poly(K/m)+sm/\epsilon^2) \log(|\Pi|/\delta))$ where $\Pi$ is the underlying (finite) class and $s$ is the sparsity parameter. This bound improves upon known bounds for combinatorial semi-bandits whene...

Read Original Article

[2502.09257] From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards

Summary

Why It Matters

Key Takeaways

Related Articles

[P] MCGrad: fix calibration of your ML model in subgroups

Ml project user give dataset and I give best model [D] [P]

[D] ICML Reviewer Acknowledgement

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

No comments

Stay updated with AI News