Machine Learning Ai Agents

[2410.02605] Policy Gradients for Cumulative Prospect Theory in Reinforcement Learning

arXiv - AI February 18, 2026 3 min read Article

Summary

This paper presents a policy gradient theorem for Cumulative Prospect Theory (CPT) in reinforcement learning, introducing a new algorithm that enhances risk assessment in decision-making processes.

Why It Matters

Understanding how Cumulative Prospect Theory can be applied to reinforcement learning is crucial for developing algorithms that better mimic human decision-making under risk. This research bridges behavioral economics and machine learning, potentially leading to more effective AI systems in uncertain environments.

Key Takeaways

Introduces a policy gradient theorem for Cumulative Prospect Theory in RL.
Develops a first-order policy gradient algorithm using Monte Carlo methods.
Establishes statistical guarantees for the proposed algorithm.
Demonstrates asymptotic convergence to stationary points of the CPT objective.
Compares the new approach with existing zeroth-order methods through simulations.

Computer Science > Machine Learning arXiv:2410.02605 (cs) [Submitted on 3 Oct 2024 (v1), last revised 17 Feb 2026 (this version, v4)] Title:Policy Gradients for Cumulative Prospect Theory in Reinforcement Learning Authors:Olivier Lepel, Anas Barakat View a PDF of the paper titled Policy Gradients for Cumulative Prospect Theory in Reinforcement Learning, by Olivier Lepel and 1 other authors View PDF HTML (experimental) Abstract:We derive a policy gradient theorem for Cumulative Prospect Theory (CPT) objectives in finite-horizon Reinforcement Learning (RL), generalizing the standard policy gradient theorem and encompassing distortion-based risk objectives as special cases. Motivated by behavioral economics, CPT combines an asymmetric utility transformation around a reference point with probability distortion. Building on our theorem, we design a first-order policy gradient algorithm for CPT-RL using a Monte Carlo gradient estimator based on order statistics. We establish statistical guarantees for the estimator and prove asymptotic convergence of the resulting algorithm to first-order stationary points of the (generally non-convex) CPT objective. Simulations illustrate qualitative behaviors induced by CPT and compare our first-order approach to existing zeroth-order methods. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2410.02605 [cs.LG] (or arXiv:2410.02605v4 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2410.02605 Focus ...

Read Original Article

[2410.02605] Policy Gradients for Cumulative Prospect Theory in Reinforcement Learning

Summary

Why It Matters

Key Takeaways

Related Articles

What's your "When Language Model AI can do X, I'll be impressed"?

Meta’s New AI Asked for My Raw Health Data—and Gave Me Terrible Advice | WIRED

What image/video training data is hardest to find right now? [R]

I implemented DPO from the paper and the reward margin hit 599 here's what that actually means [R]

No comments

Stay updated with AI News