[2506.14067] From Bandit Regret to FDR Control: Online Selective

[2506.14067] From Bandit Regret to FDR Control: Online Selective Generation with Adversarial Feedback Unlocking

arXiv - Machine Learning March 06, 2026 4 min read

About this article

Abstract page for arXiv paper 2506.14067: From Bandit Regret to FDR Control: Online Selective Generation with Adversarial Feedback Unlocking

Computer Science > Machine Learning arXiv:2506.14067 (cs) [Submitted on 16 Jun 2025 (v1), last revised 5 Mar 2026 (this version, v3)] Title:From Bandit Regret to FDR Control: Online Selective Generation with Adversarial Feedback Unlocking Authors:Minjae Lee, Yoonjae Jung, Sangdon Park View a PDF of the paper titled From Bandit Regret to FDR Control: Online Selective Generation with Adversarial Feedback Unlocking, by Minjae Lee and 1 other authors View PDF Abstract:As interactive generative systems are increasingly deployed in real-world applications, their tendency to generate unreliable or false responses raises serious concerns. Selective generation mitigates this risk by ensuring that the system answers only when confident. However, real-world deployments typically provide only partial user feedback on the selected response (e.g., thumbs up/down) and often operate in non-stationary or adversarial environments,for which effective learning methods are largely missing. To bridge this gap, we propose ExSUL, a novel online learning framework for selective generation with adversarial bandit feedback. Technically, we introduce (i) a novel conversion lemma that translates the regret of any bandit algorithm into an FDR bound, and (ii) feedback unlocking, a strategy that exploits the structure of selective generation to extract additional learning signals from partial feedback. We prove that ExSUL achieves a regret bound of $O(\sqrt{T \ln |H|})$, matching the efficiency and FDR c...

Originally published on March 06, 2026. Curated by AI News.

Machine Learning

What tools are sr MLEs using? (clawdbot, openspec, wispr) [D]

I'm already blasting cursor, but I want to level up my output. I heard that these kind of AI tools and workflows are being asked in SF. W...

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I've been documenting what I'm calling postural manipulation: a specific class of language that install...

Reddit - Machine Learning · 1 min · about 1 hour ago

Ai Infrastructure

ScaleOps raises $130M to improve computing efficiency amid AI demand | TechCrunch

ScaleOps just raised $130M to tackle GPU shortages and soaring AI cloud costs by automating infrastructure in real time.

TechCrunch - AI · 5 min · about 3 hours ago

Machine Learning

AI chip startup Rebellions raises $400 million at $2.3B valuation in pre-IPO round | TechCrunch

The startup, which is planning to go public later this year, designs chips specifically for AI inference, another challenger to Nvidia's ...

TechCrunch - AI · 4 min · about 4 hours ago

[2506.14067] From Bandit Regret to FDR Control: Online Selective Generation with Adversarial Feedback Unlocking

About this article

Related Articles

What tools are sr MLEs using? (clawdbot, openspec, wispr) [D]

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

ScaleOps raises $130M to improve computing efficiency amid AI demand | TechCrunch

AI chip startup Rebellions raises $400 million at $2.3B valuation in pre-IPO round | TechCrunch

No comments

Stay updated with AI News