[2602.01528] Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning
About this article
Abstract page for arXiv paper 2602.01528: Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning
Computer Science > Computers and Society arXiv:2602.01528 (cs) [Submitted on 2 Feb 2026 (v1), last revised 6 Apr 2026 (this version, v2)] Title:Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning Authors:Qian Wang, Xuandong Zhao, Zirui Zhang, Zhanzhi Lou, Nuo Chen, Dawn Song, Bingsheng He View a PDF of the paper titled Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning, by Qian Wang and 6 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) increasingly serve as reasoners and automated evaluators, yet they remain susceptible to cognitive biases -- often altering their reasoning when faced with spurious prompt-level cues such as consensus claims or authority appeals.} Existing mitigations via prompting or supervised fine-tuning fail to generalize, as they modify surface behavior without changing the optimization objective that makes bias cues attractive. We propose \textbf{Epistemic Independence Training (EIT)}, a reinforcement learning framework grounded in a key principle: to learn independence, bias cues must be made non-predictive of reward. EIT operationalizes this through a balanced conflict strategy where bias signals are equally likely to support correct and incorrect answers, combined with a reward design that penalizes bias-following without rewarding bias agreement. Experiments on Qwen3-4B demonstrate that EIT improves both accuracy and robustness under adversarial bias...