[2011.07687] DART: aDaptive Accept RejecT for non-linear top-K subset identification

[2011.07687] DART: aDaptive Accept RejecT for non-linear top-K subset identification

arXiv - Machine Learning 4 min read Article

Summary

The paper presents DART, a novel algorithm for non-linear top-K subset identification in bandit problems, achieving efficient performance without relying on individual arm feedback.

Why It Matters

DART addresses the challenges of selecting optimal subsets in complex environments where rewards are non-linear and correlated. This advancement has significant implications for applications in areas like marketing and resource allocation, enhancing decision-making processes.

Key Takeaways

  • DART algorithm efficiently identifies top-K subsets without individual arm feedback.
  • Achieves a regret bound of \(\tilde{\mathcal{O}}(K\sqrt{KNT})\), matching theoretical lower bounds.
  • Outperforms existing methods in both linear and non-linear reward environments.
  • Applicable to practical problems like cross-selling optimization.
  • Utilizes confidence bounds to sequentially eliminate poor options.

Computer Science > Machine Learning arXiv:2011.07687 (cs) [Submitted on 16 Nov 2020 (v1), last revised 12 Feb 2026 (this version, v2)] Title:DART: aDaptive Accept RejecT for non-linear top-K subset identification Authors:Mridul Agarwal, Vaneet Aggarwal, Christopher J. Quinn, Abhishek Umrawal View a PDF of the paper titled DART: aDaptive Accept RejecT for non-linear top-K subset identification, by Mridul Agarwal and 3 other authors View PDF HTML (experimental) Abstract:We consider the bandit problem of selecting $K$ out of $N$ arms at each time step. The reward can be a non-linear function of the rewards of the selected individual arms. The direct use of a multi-armed bandit algorithm requires choosing among $\binom{N}{K}$ options, making the action space large. To simplify the problem, existing works on combinatorial bandits {typically} assume feedback as a linear function of individual rewards. In this paper, we prove the lower bound for top-$K$ subset selection with bandit feedback with possibly correlated rewards. We present a novel algorithm for the combinatorial setting without using individual arm feedback or requiring linearity of the reward function. Additionally, our algorithm works on correlated rewards of individual arms. Our algorithm, aDaptive Accept RejecT (DART), sequentially finds good arms and eliminates bad arms based on confidence bounds. DART is computationally efficient and uses storage linear in $N$. Further, DART achieves a regret bound of $\tilde{\m...

Related Articles

Machine Learning

Artificial intelligence - Machine Learning, Robotics, Algorithms

AI Events ·
Machine Learning

Fed Chair Jerome Powell, Treasury's Bessent and top bank CEOs met over Anthropic's Mythos model

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
CoreWeave strikes a deal to power Anthropic's Claude AI models — and the stock surges 12%
Llms

CoreWeave strikes a deal to power Anthropic's Claude AI models — and the stock surges 12%

AI Tools & Products · 3 min ·
New AI model sparks alarm as governments brace for AI-driven cyberattacks
Machine Learning

New AI model sparks alarm as governments brace for AI-driven cyberattacks

AI Tools & Products · 6 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime