[2602.17086] Dynamic Decision-Making under Model Misspecification: A Stochastic Stability Approach
Summary
This paper explores dynamic decision-making under model misspecification, focusing on Thompson Sampling (TS) in Bayesian reinforcement learning. It classifies posterior evolution in a two-armed Gaussian bandit and extends the analysis to a general model class, providing insigh...
Why It Matters
Understanding decision-making under model uncertainty is crucial in economics and machine learning. This research bridges Bayesian learning with evolutionary dynamics, offering a framework for improving algorithm performance in real-world applications where model specifications may be incorrect.
Key Takeaways
- The paper identifies distinct regimes of posterior evolution in misspecified models.
- It provides a unified stochastic stability framework for analyzing decision-making dynamics.
- Key conditions for classifying ergodic and transient behaviors are established.
- The findings enhance the understanding of Thompson Sampling under uncertainty.
- This research lays the groundwork for robust decision-making in structured bandits.
Economics > Theoretical Economics arXiv:2602.17086 (econ) [Submitted on 19 Feb 2026] Title:Dynamic Decision-Making under Model Misspecification: A Stochastic Stability Approach Authors:Xinyu Dai, Daniel Chen, Yian Qian View a PDF of the paper titled Dynamic Decision-Making under Model Misspecification: A Stochastic Stability Approach, by Xinyu Dai and 2 other authors View PDF Abstract:Dynamic decision-making under model uncertainty is central to many economic environments, yet existing bandit and reinforcement learning algorithms rely on the assumption of correct model specification. This paper studies the behavior and performance of one of the most commonly used Bayesian reinforcement learning algorithms, Thompson Sampling (TS), when the model class is misspecified. We first provide a complete dynamic classification of posterior evolution in a misspecified two-armed Gaussian bandit, identifying distinct regimes: correct model concentration, incorrect model concentration, and persistent belief mixing, characterized by the direction of statistical evidence and the model-action mapping. These regimes yield sharp predictions for limiting beliefs, action frequencies, and asymptotic regret. We then extend the analysis to a general finite model class and develop a unified stochastic stability framework that represents posterior evolution as a Markov process on the belief simplex. This approach characterizes two sufficient conditions to classify the ergodic and transient behavior...