[2602.16802] References Improve LLM Alignment in Non-Verifiable Domains
Summary
This paper explores how reference-guided evaluators can enhance LLM alignment in non-verifiable domains, demonstrating significant improvements in performance through innovative evaluation protocols.
Why It Matters
As LLMs become integral in various applications, ensuring their alignment in non-verifiable domains is crucial for reliability and effectiveness. This research provides a novel approach to improve LLM performance, which could have wide-ranging implications for AI safety and application.
Key Takeaways
- Reference-guided evaluators can significantly improve LLM alignment in non-verifiable domains.
- The study introduces effective evaluation protocols that enhance LLM-based evaluators.
- High-quality references, including human-written ones, boost the accuracy of LLM judges.
- Reference-guided self-improvement outperforms traditional training methods.
- The findings suggest a promising direction for LLM post-training in complex environments.
Computer Science > Computation and Language arXiv:2602.16802 (cs) [Submitted on 18 Feb 2026] Title:References Improve LLM Alignment in Non-Verifiable Domains Authors:Kejian Shi, Yixin Liu, Peifeng Wang, Alexander R. Fabbri, Shafiq Joty, Arman Cohan View a PDF of the paper titled References Improve LLM Alignment in Non-Verifiable Domains, by Kejian Shi and 5 other authors View PDF HTML (experimental) Abstract:While Reinforcement Learning with Verifiable Rewards (RLVR) has shown strong effectiveness in reasoning tasks, it cannot be directly applied to non-verifiable domains lacking ground-truth verifiers, such as LLM alignment. In this work, we investigate whether reference-guided LLM-evaluators can bridge this gap by serving as soft "verifiers". First, we design evaluation protocols that enhance LLM-based evaluators for LLM alignment using reference outputs. Through comprehensive experiments, we show that a reference-guided approach substantially improves the accuracy of less capable LLM-judges using references from frontier models; stronger LLM-judges can also be enhanced by high-quality (i.e., human-written) references. Building on these improved judges, we demonstrate the utility of high-quality references in alignment tuning, where LLMs guided with references are used as judges to self-improve. We show that reference-guided self-improvement yields clear gains over both direct SFT on reference outputs and self-improvement with reference-free judges, achieving performance...