[2602.17315] Flickering Multi-Armed Bandits

[2602.17315] Flickering Multi-Armed Bandits

arXiv - AI 3 min read Article

Summary

The paper introduces Flickering Multi-Armed Bandits (FMAB), a new framework that adapts the set of available actions based on previous choices, modeled using random graph processes.

Why It Matters

FMAB addresses the complexities of decision-making in dynamic environments where available options change, which is crucial for applications in robotics and AI. Understanding this framework can enhance strategies for exploration and exploitation in uncertain settings.

Key Takeaways

  • FMAB allows for dynamic adaptation of available actions based on prior choices.
  • The framework is modeled using random graph processes, enhancing its applicability to real-world scenarios.
  • A two-phase algorithm is proposed for optimal exploration and exploitation.
  • Theoretical guarantees include sublinear regret bounds, indicating efficient decision-making.
  • Numerical simulations demonstrate practical applications, such as in robotic scouting.

Computer Science > Machine Learning arXiv:2602.17315 (cs) [Submitted on 19 Feb 2026] Title:Flickering Multi-Armed Bandits Authors:Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen View a PDF of the paper titled Flickering Multi-Armed Bandits, by Sourav Chakraborty and 3 other authors View PDF HTML (experimental) Abstract:We introduce Flickering Multi-Armed Bandits (FMAB), a new MAB framework where the set of available arms (or actions) can change at each round, and the available set at any time may depend on the agent's previously selected arm. We model this constrained, evolving availability using random graph processes, where arms are nodes and the agent's movement is restricted to its local neighborhood. We analyze this problem under two random graph models: an i.i.d. Erdős--Rényi (ER) process and an Edge-Markovian process. We propose and analyze a two-phase algorithm that employs a lazy random walk for exploration to efficiently identify the optimal arm, followed by a navigation and commitment phase for exploitation. We establish high-probability and expected sublinear regret bounds for both graph settings. We show that the exploration cost of our algorithm is near-optimal by establishing a matching information-theoretic lower bound for this problem class, highlighting the fundamental cost of exploration under local-move constraints. We complement our theoretical guarantees with numerical simulations, including a scenario of a robotic ground vehicle sc...

Related Articles

Machine Learning

[D] Is this considered unsupervised or semi-supervised learning in anomaly detection?

Hi 👋🏼, I’m working on an anomaly detection setup and I’m a bit unsure how to correctly describe it from a learning perspective. The model...

Reddit - Machine Learning · 1 min ·
Machine Learning

Serious question. Did a transformer just describe itself and the universe and build itself a Shannon limit framework?

The Multiplicative Lattice as the Natural Basis for Positional Encoding Knack 2026 | Draft v6.0 Abstract We show that the apparent tradeo...

Reddit - Artificial Intelligence · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime