[2604.02766] Random Is Hard to Beat: Active Selection in online DPO

[2604.02766] Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs

arXiv - AI April 06, 2026 3 min read

About this article

Abstract page for arXiv paper 2604.02766: Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs

Computer Science > Machine Learning arXiv:2604.02766 (cs) [Submitted on 3 Apr 2026] Title:Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs Authors:Giyeong Oh, Junghyun Lee, Jaehyun Park, Youngjae Yu, Wonho Bae, Junhyug Noh View a PDF of the paper titled Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs, by Giyeong Oh and 4 other authors View PDF HTML (experimental) Abstract:Modern LLMs inherit strong priors from web-scale pretraining, which can limit the headroom of post-training data-selection strategies. While Active Preference Learning (APL) seeks to optimize query efficiency in online Direct Preference Optimization (DPO), the inherent richness of on-policy candidate pools often renders simple Random sampling a surprisingly formidable baseline. We evaluate uncertainty-based APL against Random across harmlessness, helpfulness, and instruction-following settings, utilizing both reward models and LLM-as-a-judge proxies. We find that APL yields negligible improvements in proxy win-rates compared to Random. Crucially, we observe a dissociation where win-rate improves even as general capability -- measured by standard benchmarks -- degrades. APL fails to mitigate this capability collapse or reduce variance significantly better than random sampling. Our findings suggest that in the regime of strong pre-trained priors, the computational overhead of active selection is difficult to justify against the ``cheap diversity'' provided by s...

Originally published on April 06, 2026. Curated by AI News.

Llms

Florida AG announces investigation into OpenAI over shooting that allegedly involved ChatGPT | TechCrunch

ChatGPT had reportedly been used to plan the attack that killed two and injured five at Florida State University last April. The family o...

TechCrunch - AI · 4 min · less than a minute ago

Llms

We’re open-sourcing a 33-benchmark diagnostic for AI alignment gaps, launches April 27

On April 27 we’re open-sourcing a free diagnostic tool called iFixAi. You run it against your AI system (agent, copilot, LLM integration,...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

Google’s Gemini AI can answer your questions with 3D models and simulations

submitted by /u/tekz [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

Google’s Gemini AI can answer your questions with 3D models and simulations | The Verge

Google is rolling out a new feature for its Gemini AI chatbot, allowing the tool to generate 3D models and simulations to explain the con...

The Verge - AI · 4 min · about 3 hours ago

[2604.02766] Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs

About this article

Related Articles

Florida AG announces investigation into OpenAI over shooting that allegedly involved ChatGPT | TechCrunch

We’re open-sourcing a 33-benchmark diagnostic for AI alignment gaps, launches April 27

Google’s Gemini AI can answer your questions with 3D models and simulations

Google’s Gemini AI can answer your questions with 3D models and simulations | The Verge

No comments

Stay updated with AI News