[2502.09257] From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards

[2502.09257] From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards

arXiv - AI 4 min read Article

Summary

This paper explores contextual combinatorial semi-bandits, presenting an algorithm that improves sample complexity in sparse reward scenarios, particularly relevant for recommendation systems.

Why It Matters

Understanding the dynamics of contextual combinatorial semi-bandits is crucial for enhancing machine learning algorithms in applications like recommendation systems. This research provides improved sample complexity, which can lead to more efficient and effective decision-making processes in various AI applications.

Key Takeaways

  • Introduces an algorithm that achieves improved sample complexity for contextual combinatorial semi-bandits.
  • Focuses on the $s$-sparse regime, relevant for real-world applications like recommendation systems.
  • Proves significant improvements in sample complexity bounds compared to existing methods.
  • Extends the framework to generalize list multiclass classification with bandit feedback.
  • Establishes new regret bounds for adversarial data generation scenarios.

Computer Science > Machine Learning arXiv:2502.09257 (cs) [Submitted on 13 Feb 2025 (v1), last revised 23 Feb 2026 (this version, v4)] Title:From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards Authors:Liad Erez, Tomer Koren View a PDF of the paper titled From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards, by Liad Erez and Tomer Koren View PDF HTML (experimental) Abstract:We study the problem of contextual combinatorial semi-bandits, where input contexts are mapped into subsets of size $m$ of a collection of $K$ possible actions. In each round, the learner observes the realized reward of the predicted actions. Motivated by prototypical applications of contextual bandits, we focus on the $s$-sparse regime where we assume that the sum of rewards is bounded by some value $s\ll K$. For example, in recommendation systems the number of products purchased by any customer is significantly smaller than the total number of available products. Our main result is for the $(\epsilon,\delta)$-PAC variant of the problem for which we design an algorithm that returns an $\epsilon$-optimal policy with high probability using a sample complexity of $\tilde{O}((poly(K/m)+sm/\epsilon^2) \log(|\Pi|/\delta))$ where $\Pi$ is the underlying (finite) class and $s$ is the sparsity parameter. This bound improves upon known bounds for combinatorial semi-bandits whene...

Related Articles

Machine Learning

[P] MCGrad: fix calibration of your ML model in subgroups

Hi r/MachineLearning, We’re open-sourcing MCGrad, a Python package for multicalibration–developed and deployed in production at Meta. Thi...

Reddit - Machine Learning · 1 min ·
Machine Learning

Ml project user give dataset and I give best model [D] [P]

Tl,dr : suggest me a solution to create a ai ml project where user will give his dataset as input and the project should give best model ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ICML Reviewer Acknowledgement

Hi, I'm a little confused about ICML discussion period Does the period for reviewer acknowledging responses have already ended? One of th...

Reddit - Machine Learning · 1 min ·
Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime