[2602.04737] Rationality Measurement and Theory for Reinforcement Learning Agents

[2602.04737] Rationality Measurement and Theory for Reinforcement Learning Agents

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2602.04737: Rationality Measurement and Theory for Reinforcement Learning Agents

Computer Science > Machine Learning arXiv:2602.04737 (cs) [Submitted on 4 Feb 2026 (v1), last revised 3 May 2026 (this version, v2)] Title:Rationality Measurement and Theory for Reinforcement Learning Agents Authors:Kejiang Qian, Amos Storkey, Fengxiang He View a PDF of the paper titled Rationality Measurement and Theory for Reinforcement Learning Agents, by Kejiang Qian and 2 other authors View PDF HTML (experimental) Abstract:This paper proposes a suite of rationality measures and associated theory for reinforcement learning agents, a property increasingly critical yet rarely explored. We define an action in deployment to be perfectly rational if it maximises the hidden true value function in the steepest direction. The expected value discrepancy of a policy's actions against their rational counterparts, culminating over the trajectory in deployment, is defined to be expected rational risk; an empirical average version in training is also defined. Their difference, termed as rational risk gap, is decomposed into (1) an extrinsic component caused by environment shifts between training and deployment, and (2) an intrinsic one due to the algorithm's generalisability in a dynamic environment. They are upper bounded by, respectively, (1) the $1$-Wasserstein distance between transition kernels and initial state distributions in training and deployment, and (2) the empirical Rademacher complexity of the value function class. Our theory suggests hypotheses on the benefits from r...

Originally published on May 05, 2026. Curated by AI News.

Related Articles

CUDA Proves Nvidia Is a Software Company | WIRED
Ai Infrastructure

CUDA Proves Nvidia Is a Software Company | WIRED

There’s a deep, forbidding moat that surrounds Nvidia—and it has nothing to do with hardware.

Wired - AI · 9 min ·
[2511.02805] MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning
Llms

[2511.02805] MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning

Abstract page for arXiv paper 2511.02805: MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Lea...

arXiv - AI · 3 min ·
[2510.22944] Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies
Llms

[2510.22944] Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies

Abstract page for arXiv paper 2510.22944: Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies

arXiv - AI · 4 min ·
[2508.10880] Searching for Privacy Risks in LLM Agents via Simulation
Llms

[2508.10880] Searching for Privacy Risks in LLM Agents via Simulation

Abstract page for arXiv paper 2508.10880: Searching for Privacy Risks in LLM Agents via Simulation

arXiv - AI · 3 min ·
More in Ai Infrastructure: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime