Ai Agents Machine Learning Data Science

[2602.16183] Multi-Agent Combinatorial-Multi-Armed-Bandit framework for the Submodular Welfare Problem under Bandit Feedback

arXiv - Machine Learning February 19, 2026 3 min read Article

Summary

This paper presents a multi-agent combinatorial multi-armed bandit framework for the Submodular Welfare Problem, achieving improved regret bounds under bandit feedback.

Why It Matters

The study enhances existing models of the Submodular Welfare Problem by introducing a multi-agent framework that operates under bandit feedback, addressing limitations of previous single-agent models. This advancement is crucial for applications in resource allocation and decision-making in uncertain environments, where agents cannot communicate.

Key Takeaways

Introduces a multi-agent framework for the Submodular Welfare Problem.
Achieves a regret bound of \tilde{\mathcal{O}}(T^{2/3}) under bandit feedback.
Extends classical algorithms to scenarios with non-communicating agents.
Proposes an explore-then-commit strategy with randomized assignments.
Addresses shared allocation constraints among agents.

Computer Science > Computer Science and Game Theory arXiv:2602.16183 (cs) [Submitted on 18 Feb 2026] Title:Multi-Agent Combinatorial-Multi-Armed-Bandit framework for the Submodular Welfare Problem under Bandit Feedback Authors:Subham Pokhriyal, Shweta Jain, Vaneet Aggarwal View a PDF of the paper titled Multi-Agent Combinatorial-Multi-Armed-Bandit framework for the Submodular Welfare Problem under Bandit Feedback, by Subham Pokhriyal and 2 other authors View PDF HTML (experimental) Abstract:We study the \emph{Submodular Welfare Problem} (SWP), where items are partitioned among agents with monotone submodular utilities to maximize the total welfare under \emph{bandit feedback}. Classical SWP assumes full value-oracle access, achieving $(1-1/e)$ approximations via continuous-greedy algorithms. We extend this to a \emph{multi-agent combinatorial bandit} framework (\textsc{MA-CMAB}), where actions are partitions under full-bandit feedback with non-communicating agents. Unlike prior single-agent or separable multi-agent CMAB models, our setting couples agents through shared allocation constraints. We propose an explore-then-commit strategy with randomized assignments, achieving $\tilde{\mathcal{O}}(T^{2/3})$ regret against a $(1-1/e)$ benchmark, the first such guarantee for partition-based submodular welfare problem under bandit feedback. Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Machine Learning (stat.ML) Cite as: arXiv:2602.16183 [cs.GT] ...

Read Original Article

[2602.16183] Multi-Agent Combinatorial-Multi-Armed-Bandit framework for the Submodular Welfare Problem under Bandit Feedback

Summary

Why It Matters

Key Takeaways

Related Articles

Boston's CIO wants the public — and other city governments — to use his open-source agentic AI tools

[P] If you're building AI agents, logs aren't enough. You need evidence.

[2602.00185] QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

[2506.22653] URSA: The Universal Research and Scientific Agent

No comments

Stay updated with AI News