[2503.00509] Functional multi-armed bandit and the best function identification problems

[2503.00509] Functional multi-armed bandit and the best function identification problems

arXiv - AI 4 min read Article

Summary

This article introduces the functional multi-armed bandit (FMAB) problem and the best function identification problem, proposing a new algorithm to optimize decision-making in scenarios with limited feedback.

Why It Matters

Understanding FMAB and best function identification is crucial as they model real-world optimization problems, particularly in competitive environments like LLM training. The proposed algorithms could enhance performance in various applications, making this research relevant for advancements in machine learning and artificial intelligence.

Key Takeaways

  • Introduces functional multi-armed bandit (FMAB) and best function identification problems.
  • Proposes a new algorithm (F-LCB) for optimizing decision-making with limited feedback.
  • Demonstrates the application of these problems in competitive LLM training.
  • Provides regret upper bounds based on convergence rates of base algorithms.
  • Includes numerical experiments showcasing the proposed scheme's performance.

Computer Science > Machine Learning arXiv:2503.00509 (cs) [Submitted on 1 Mar 2025 (v1), last revised 17 Feb 2026 (this version, v2)] Title:Functional multi-armed bandit and the best function identification problems Authors:Yuriy Dorn, Aleksandr Katrutsa, Ilgam Latypov, Anastasiia Soboleva View a PDF of the paper titled Functional multi-armed bandit and the best function identification problems, by Yuriy Dorn and 3 other authors View PDF HTML (experimental) Abstract:Bandit optimization usually refers to the class of online optimization problems with limited feedback, namely, a decision maker uses only the objective value at the current point to make a new decision and does not have access to the gradient of the objective function. While this name accurately captures the limitation in feedback, it is somehow misleading since it does not have any connection with the multi-armed bandits (MAB) problem class. We propose two new classes of problems: the functional multi-armed bandit problem (FMAB) and the best function identification problem. They are modifications of a multi-armed bandit problem and the best arm identification problem, respectively, where each arm represents an unknown black-box function. These problem classes are a surprisingly good fit for modeling real-world problems such as competitive LLM training. To solve the problems from these classes, we propose a new reduction scheme to construct UCB-type algorithms, namely, the F-LCB algorithm, based on algorithms f...

Related Articles

Llms

What's your "When Language Model AI can do X, I'll be impressed"?

I have two at the top of my mind: When it can read musical notes. I will be mildly impressed when I can paste in a picture of musical not...

Reddit - Artificial Intelligence · 1 min ·
Meta’s New AI Asked for My Raw Health Data—and Gave Me Terrible Advice | WIRED
Machine Learning

Meta’s New AI Asked for My Raw Health Data—and Gave Me Terrible Advice | WIRED

Meta’s Muse Spark model offers to analyze users’ health data, including lab results. Beyond the obvious privacy risks, it’s not a capable...

Wired - AI · 9 min ·
Machine Learning

What image/video training data is hardest to find right now? [R]

I'm building a crowdsourced photo collection platform (contributors take photos with smartphones, we auto-label with YOLO/CLIP + enrich w...

Reddit - Machine Learning · 1 min ·
Machine Learning

I implemented DPO from the paper and the reward margin hit 599 here's what that actually means [R]

DPO (Rafailov et al., NeurIPS 2023) is supposed to be the clean alternative to PPO. No reward model in the training loop, no value functi...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime