[2604.02527] Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits
About this article
Abstract page for arXiv paper 2604.02527: Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits
Computer Science > Machine Learning arXiv:2604.02527 (cs) [Submitted on 2 Apr 2026] Title:Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits Authors:Adam Bayley, Xiaodan Zhu, Raquel Aoki, Yanshuai Cao, Kevin H. Wilson View a PDF of the paper titled Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits, by Adam Bayley and 4 other authors View PDF HTML (experimental) Abstract:The recent advancement of Large Language Models (LLMs) offers new opportunities to generate user preference data to warm-start bandits. Recent studies on contextual bandits with LLM initialization (CBLI) have shown that these synthetic priors can significantly lower early regret. However, these findings assume that LLM-generated choices are reasonably aligned with actual user preferences. In this paper, we systematically examine how LLM-generated preferences perform when random and label-flipping noise is injected into the synthetic training data. For aligned domains, we find that warm-starting remains effective up to 30% corruption, loses its advantage around 40%, and degrades performance beyond 50%. When there is systematic misalignment, even without added noise, LLM-generated priors can lead to higher regret than a cold-start bandit. To explain these behaviors, we develop a theoretical analysis that decomposes the effect of random label noise and systematic misalignment on the prior error driving the bandit's regret, and...