[2603.01283] Beyond Reward: A Bounded Measure of Agent Environment Coupling
About this article
Abstract page for arXiv paper 2603.01283: Beyond Reward: A Bounded Measure of Agent Environment Coupling
Computer Science > Artificial Intelligence arXiv:2603.01283 (cs) [Submitted on 1 Mar 2026] Title:Beyond Reward: A Bounded Measure of Agent Environment Coupling Authors:Wael Hafez, Cameron Reid, Amit Nazeri View a PDF of the paper titled Beyond Reward: A Bounded Measure of Agent Environment Coupling, by Wael Hafez and 2 other authors View PDF Abstract:Real-world reinforcement learning (RL) agents operate in closed-loop systems where actions shape future observations, making reliable deployment under distribution shifts a persistent challenge. Existing monitoring relies on reward or task metrics, capturing outcomes but missing early coupling failures. We introduce bipredictability (P) as the ratio of shared information in the observation, action, outcome loop to the total available information, a principled, real time measure of interaction effectiveness with provable bounds, comparable across tasks. An auxiliary monitor, the Information Digital Twin (IDT), computes P and its diagnostic components from the interaction stream. We evaluate SAC and PPO agents on MuJoCo HalfCheetah under eight agent, and environment-side perturbations across 168 trials. Under nominal operation, agents exhibit P = 0.33 plus minus 0.02, below the classical bound of 0.5, revealing an informational cost of action selection. The IDT detects 89.3% of perturbations versus 44.0% for reward based monitoring, with 4.4x lower median latency. Bipredictability enables early detection of interaction degradati...