[2506.14067] From Bandit Regret to FDR Control: Online Selective Generation with Adversarial Feedback Unlocking
About this article
Abstract page for arXiv paper 2506.14067: From Bandit Regret to FDR Control: Online Selective Generation with Adversarial Feedback Unlocking
Computer Science > Machine Learning arXiv:2506.14067 (cs) [Submitted on 16 Jun 2025 (v1), last revised 5 Mar 2026 (this version, v3)] Title:From Bandit Regret to FDR Control: Online Selective Generation with Adversarial Feedback Unlocking Authors:Minjae Lee, Yoonjae Jung, Sangdon Park View a PDF of the paper titled From Bandit Regret to FDR Control: Online Selective Generation with Adversarial Feedback Unlocking, by Minjae Lee and 1 other authors View PDF Abstract:As interactive generative systems are increasingly deployed in real-world applications, their tendency to generate unreliable or false responses raises serious concerns. Selective generation mitigates this risk by ensuring that the system answers only when confident. However, real-world deployments typically provide only partial user feedback on the selected response (e.g., thumbs up/down) and often operate in non-stationary or adversarial environments,for which effective learning methods are largely missing. To bridge this gap, we propose ExSUL, a novel online learning framework for selective generation with adversarial bandit feedback. Technically, we introduce (i) a novel conversion lemma that translates the regret of any bandit algorithm into an FDR bound, and (ii) feedback unlocking, a strategy that exploits the structure of selective generation to extract additional learning signals from partial feedback. We prove that ExSUL achieves a regret bound of $O(\sqrt{T \ln |H|})$, matching the efficiency and FDR c...