Machine Learning Ai Safety Data Science

[2410.23029] Risk-Aware Decision Making in Restless Bandits: Theory and Algorithms for Planning and Learning

arXiv - Machine Learning February 20, 2026 4 min read Article

Summary

This paper explores risk-aware decision-making in restless bandits, proposing new algorithms for planning and learning that incorporate risk mitigation strategies.

Why It Matters

Understanding risk-aware decision-making is crucial for applications in fields like healthcare and resource management, where minimizing downside risks can significantly impact outcomes. This research advances the theory and practical algorithms for addressing these challenges in restless bandits.

Key Takeaways

Introduces risk-aware objectives in the restless bandits framework.
Establishes indexability conditions for risk-aware decision-making.
Proposes a Thompson sampling approach for learning under uncertainty.
Demonstrates the efficacy of the proposed methods through numerical experiments.
Highlights applications in machine replacement and patient scheduling.

Computer Science > Machine Learning arXiv:2410.23029 (cs) [Submitted on 30 Oct 2024 (v1), last revised 19 Feb 2026 (this version, v3)] Title:Risk-Aware Decision Making in Restless Bandits: Theory and Algorithms for Planning and Learning Authors:Nima Akbarzadeh, Yossiri Adulyasak, Erick Delage View a PDF of the paper titled Risk-Aware Decision Making in Restless Bandits: Theory and Algorithms for Planning and Learning, by Nima Akbarzadeh and 2 other authors View PDF Abstract:In restless bandits, a central agent is tasked with optimally distributing limited resources across several bandits (arms), with each arm being a Markov decision process. In this work, we generalize the traditional restless bandits problem with a risk-neutral objective by incorporating risk-awareness, which is particularly important in various real-world applications especially when the decision maker seeks to mitigate downside risks. We establish indexability conditions for the case of a risk-aware objective and provide a solution based on Whittle index for the first time for the planning problem with finite-horizon non-stationary and for infinite-horizon stationary Markov decision processes. In addition, we address the learning problem when the true transition probabilities are unknown by proposing a Thompson sampling approach and show that it achieves bounded regret that scales sublinearly with the number of episodes and quadratically with the number of arms. The efficacy of our method in reducing ri...

Read Original Article

Llms

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

The problem If you work with Italian text and local models, you know the pain. Every open-source LLM out there treats Italian as an after...

Reddit - Machine Learning · 1 min · 24 minutes ago

Machine Learning

[R] Architecture Determines Optimization: Deriving Weight Updates from Network Topology (seeking arXiv endorsement - cs.LG)

Abstract: We derive neural network weight updates from first principles without assuming gradient descent or a specific loss function. St...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

[P] ML project (XGBoost + Databricks + MLflow) — how to talk about “production issues” in interviews?

Hey all, I recently built an end-to-end fraud detection project using a large banking dataset: Trained an XGBoost model Used Databricks f...

Reddit - Machine Learning · 1 min · about 4 hours ago

Machine Learning

[D] The memory chip market lost tens of billions over a paper this community would have understood in 10 minutes

TurboQuant was teased recently and tens of billions gone from memory chip market in 48 hours but anyone in this community who read the pa...

Reddit - Machine Learning · 1 min · about 4 hours ago

[2410.23029] Risk-Aware Decision Making in Restless Bandits: Theory and Algorithms for Planning and Learning

Summary

Why It Matters

Key Takeaways

Related Articles

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

[R] Architecture Determines Optimization: Deriving Weight Updates from Network Topology (seeking arXiv endorsement - cs.LG)

[P] ML project (XGBoost + Databricks + MLflow) — how to talk about “production issues” in interviews?

[D] The memory chip market lost tens of billions over a paper this community would have understood in 10 minutes

No comments

Stay updated with AI News