[2604.00200] Offline Constrained RLHF with Multiple Preference Oracles

[2604.00200] Offline Constrained RLHF with Multiple Preference Oracles

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2604.00200: Offline Constrained RLHF with Multiple Preference Oracles

Computer Science > Machine Learning arXiv:2604.00200 (cs) [Submitted on 31 Mar 2026] Title:Offline Constrained RLHF with Multiple Preference Oracles Authors:Brenden Latham, Mehrdad Moharrami View a PDF of the paper titled Offline Constrained RLHF with Multiple Preference Oracles, by Brenden Latham and Mehrdad Moharrami View PDF HTML (experimental) Abstract:We study offline constrained reinforcement learning from human feedback with multiple preference oracles. Motivated by applications that trade off performance with safety or fairness, we aim to maximize target population utility subject to a minimum protected group welfare constraint. From pairwise comparisons collected under a reference policy, we estimate oracle-specific rewards via maximum likelihood and analyze how statistical uncertainty propagates through the dual program. We cast the constrained objective as a KL-regularized Lagrangian whose primal optimizer is a Gibbs policy, reducing learning to a convex dual problem. We propose a dual-only algorithm that ensures high-probability constraint satisfaction and provide the first finite-sample performance guarantees for offline constrained preference learning. Finally, we extend our theoretical analysis to accommodate multiple constraints and general f-divergence regularization. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2604.00200 [cs.LG]   (or arXiv:2604.00200v1 [cs.LG] for this version)   https://doi.org/10.48550/arXiv.2604.00200 Focus to learn more arXiv-i...

Originally published on April 02, 2026. Curated by AI News.

Related Articles

Llms

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

China drafts law regulating 'digital humans' and banning addictive virtual services for children

A Reuters report outlines China's proposed regulations on the rapidly expanding sector of digital humans and AI avatars. Under the new dr...

Reddit - Artificial Intelligence · 1 min ·
[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion
Generative Ai

[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

Abstract page for arXiv paper 2512.00408: Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

arXiv - AI · 3 min ·
[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
Llms

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

Abstract page for arXiv paper 2510.15148: XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

arXiv - AI · 4 min ·
More in Ai Safety: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime