[2604.04101] Restless Bandits with Individual Penalty Constraints: A

[2604.04101] Restless Bandits with Individual Penalty Constraints: A New Near-Optimal Index Policy and How to Learn It

arXiv - Machine Learning April 07, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.04101: Restless Bandits with Individual Penalty Constraints: A New Near-Optimal Index Policy and How to Learn It

Computer Science > Machine Learning arXiv:2604.04101 (cs) [Submitted on 5 Apr 2026] Title:Restless Bandits with Individual Penalty Constraints: A New Near-Optimal Index Policy and How to Learn It Authors:Nida Zamir, I-Hong Hou View a PDF of the paper titled Restless Bandits with Individual Penalty Constraints: A New Near-Optimal Index Policy and How to Learn It, by Nida Zamir and I-Hong Hou View PDF HTML (experimental) Abstract:This paper investigates the Restless Multi-Armed Bandit (RMAB) framework under individual penalty constraints to address resource allocation challenges in dynamic wireless networked environments. Unlike conventional RMAB models, our model allows each user (arm) to have distinct and stringent performance constraints, such as energy limits, activation limits, or age of information minimums, enabling the capture of diverse objectives including fairness and efficiency. To find the optimal resource allocation policy, we propose a new Penalty-Optimal Whittle (POW) index policy. The POW index of an user only depends on the user's transition kernel and penalty constraints, and remains invariable to system-wide features such as the number of users present and the amount of resource available. This makes it computationally tractable to calculate the POW Indices offline without any need for online adaptation. Moreover, we theoretically prove that the POW index policy is asymptotically optimal while satisfying all individual penalty constraints. We also introdu...

Originally published on April 07, 2026. Curated by AI News.

Llms

[2602.06869] Uncovering Cross-Objective Interference in Multi-Objective Alignment

Abstract page for arXiv paper 2602.06869: Uncovering Cross-Objective Interference in Multi-Objective Alignment

arXiv - Machine Learning · 3 min · 24 minutes ago

Machine Learning

[2604.07401] Geometric Entropy and Retrieval Phase Transitions in Continuous Thermal Dense Associative Memory

Abstract page for arXiv paper 2604.07401: Geometric Entropy and Retrieval Phase Transitions in Continuous Thermal Dense Associative Memory

arXiv - Machine Learning · 4 min · 24 minutes ago

Llms

[2512.14954] Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation

Abstract page for arXiv paper 2512.14954: Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation

arXiv - Machine Learning · 4 min · 24 minutes ago

Machine Learning

[2507.12768] AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation

Abstract page for arXiv paper 2507.12768: AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation

arXiv - Machine Learning · 4 min · 24 minutes ago

[2604.04101] Restless Bandits with Individual Penalty Constraints: A New Near-Optimal Index Policy and How to Learn It

About this article

Related Articles

[2602.06869] Uncovering Cross-Objective Interference in Multi-Objective Alignment

[2604.07401] Geometric Entropy and Retrieval Phase Transitions in Continuous Thermal Dense Associative Memory

[2512.14954] Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation

[2507.12768] AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation

No comments

Stay updated with AI News