[2506.04051] High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning
Summary
The paper presents HALT, a method for finetuning large language models (LLMs) to enhance reliability by generating responses only when confident, thus reducing hallucinations.
Why It Matters
As LLMs become integral in various applications, ensuring their reliability is crucial. HALT addresses the issue of incorrect outputs by aligning model capabilities with response generation, which could significantly improve user trust and application effectiveness in critical fields like medicine and coding.
Key Takeaways
- HALT finetunes LLMs to respond only when confident, reducing hallucinations.
- The method improves correctness of responses by an average of 15%.
- HALT allows a tunable trade-off between response completeness and correctness.
- The finetuned Llama3-70B model achieved 87% correctness while maintaining 53% completeness.
- HALT can be applied across various domains, including coding and medicine.
Computer Science > Computation and Language arXiv:2506.04051 (cs) [Submitted on 4 Jun 2025 (v1), last revised 15 Feb 2026 (this version, v2)] Title:High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning Authors:Tim Franzmeyer, Archie Sravankumar, Lijuan Liu, Yuning Mao, Rui Hou, Sinong Wang, Jakob N. Foerster, Luke Zettlemoyer, Madian Khabsa View a PDF of the paper titled High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning, by Tim Franzmeyer and 8 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) currently respond to every prompt. However, they can produce incorrect answers when they lack knowledge or capability -- a problem known as hallucination. We instead propose post-training an LLM to generate content only when confident in its correctness and to otherwise (partially) abstain. Specifically, our method, HALT, produces capability-aligned post-training data that encodes what the model can and cannot reliably generate. We generate this data by splitting responses of the pretrained LLM into factual fragments (atomic statements or reasoning steps), and use ground truth information to identify incorrect fragments. We achieve capability-aligned finetuning responses by either removing incorrect fragments or replacing them with "Unsure from Here" -- according to a tunable threshold that allows practitioners to trade off response completeness and mean correctness of the response's frag...