[2604.08577] Distributionally Robust Token Optimization in RLHF
About this article
Abstract page for arXiv paper 2604.08577: Distributionally Robust Token Optimization in RLHF
Computer Science > Machine Learning arXiv:2604.08577 (cs) [Submitted on 27 Mar 2026] Title:Distributionally Robust Token Optimization in RLHF Authors:Yeping Jin, Jiaming Hu, Ioannis Ch. Paschalidis View a PDF of the paper titled Distributionally Robust Token Optimization in RLHF, by Yeping Jin and 1 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) tend to respond correctly to prompts that align to the data they were trained and fine-tuned on. Yet, small shifts in wording, format, or language can trigger surprisingly large failures, especially on multi-step reasoning problems. To address this problem, we propose a Distributionally Robust Token Optimization (DRTO) approach, which combines token-level Reinforcement Learning from Human Feedback (RLHF) with Distributionally Robust Optimization (DRO). DRTO bounds worst case token-wise rewards by constructing an f-divergence ambiguity set over a loss minibatch, leading to a theoretical robustness. Empirically, DRTO enhances consistency under distribution shifts in mathematical reasoning benchmarks, achieving 9.17\% improvement on GSM8K and 2.49% improvement on MathQA. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2604.08577 [cs.LG] (or arXiv:2604.08577v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2604.08577 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Yeping Jin [view email] [v1] Fri, 27 Mar 2026 21:36:32 UTC (330 ...