[2511.04454] Fitting Reinforcement Learning Model to Behavioral Data under Bandits
About this article
Abstract page for arXiv paper 2511.04454: Fitting Reinforcement Learning Model to Behavioral Data under Bandits
Computer Science > Computational Engineering, Finance, and Science arXiv:2511.04454 (cs) [Submitted on 6 Nov 2025 (v1), last revised 26 Mar 2026 (this version, v2)] Title:Fitting Reinforcement Learning Model to Behavioral Data under Bandits Authors:Hao Zhu, Jasper Hoffmann, Baohe Zhang, Joschka Boedecker View a PDF of the paper titled Fitting Reinforcement Learning Model to Behavioral Data under Bandits, by Hao Zhu and 3 other authors View PDF HTML (experimental) Abstract:We consider the problem of fitting a reinforcement learning (RL) model to some given behavioral data under a multi-armed bandit environment. These models have received much attention in recent years for characterizing human and animal decision making behavior. We provide a generic mathematical optimization problem formulation for the fitting problem of a wide range of RL models that appear frequently in scientific research applications. We then provide a detailed theoretical analysis of its convexity properties. Based on the theoretical results, we introduce a novel solution method for the fitting problem of RL models based on convex relaxation and optimization. Our method is then evaluated in several simulated and real-world bandit environments to compare with some benchmark methods that appear in the literature. Numerical results indicate that our method achieves comparable performance to the state-of-the-art, while significantly reducing computation time. We also provide an open-source Python package f...