[2503.08796] Robust Multi-Objective Controlled Decoding of Large Language Models
Summary
This article presents Robust Multi-Objective Decoding (RMOD), an innovative algorithm designed to enhance the performance of Large Language Models (LLMs) by aligning them with multiple human objectives effectively.
Why It Matters
As LLMs become integral in various applications, ensuring they meet diverse human objectives like safety and helpfulness is crucial. RMOD offers a robust solution to optimize LLM outputs, making it relevant for developers and researchers focused on AI alignment and safety.
Key Takeaways
- RMOD aligns LLMs with multiple objectives by maximizing worst-case rewards.
- The algorithm is based on a maximin two-player game, solvable via Nash equilibrium.
- RMOD introduces minimal computational overhead compared to traditional methods.
- Experimental results show RMOD consistently outperforms existing baselines.
- The approach enhances LLM alignment across various practical applications.
Computer Science > Machine Learning arXiv:2503.08796 (cs) [Submitted on 11 Mar 2025 (v1), last revised 16 Feb 2026 (this version, v2)] Title:Robust Multi-Objective Controlled Decoding of Large Language Models Authors:Seongho Son, William Bankes, Sangwoong Yoon, Shyam Sundhar Ramesh, Xiaohang Tang, Ilija Bogunovic View a PDF of the paper titled Robust Multi-Objective Controlled Decoding of Large Language Models, by Seongho Son and 5 other authors View PDF HTML (experimental) Abstract:We introduce Robust Multi-Objective Decoding (RMOD), a novel inference-time algorithm that robustly aligns Large Language Models (LLMs) to multiple human objectives (e.g., instruction-following, helpfulness, safety) by maximizing the worst-case rewards. RMOD formulates the robust decoding problem as a maximin two-player game between adversarially computed reward weights and the sampling policy, solvable through a Nash equilibrium. We demonstrate that this game reduces to a convex optimization problem to identify the worst-case reward weights, with the optimal sampling policy analytically derived. For practical applications, we propose an efficient algorithm of RMOD tailored for contemporary LLMs, introducing minimal computational overhead compared to standard non-robust Controlled Decoding methods. Experimental results across a range of popular alignment datasets with up to 10 objectives show the effectiveness of RMOD and its distilled version, consistently outperforming baselines in worst-case...