Llms Machine Learning Ai Startups Ai Safety

[2602.13699] Attention Head Entropy of LLMs Predicts Answer Correctness

arXiv - Machine Learning February 17, 2026 3 min read Article

Summary

This paper introduces Head Entropy, a method for predicting answer correctness in large language models (LLMs) by analyzing attention entropy patterns, demonstrating improved performance over existing methods.

Why It Matters

As LLMs are increasingly used in critical applications, ensuring their reliability is vital. This research provides a novel approach to assess answer correctness, potentially reducing risks associated with incorrect outputs in high-stakes environments like healthcare.

Key Takeaways

Head Entropy effectively predicts answer correctness using attention patterns.
It outperforms existing methods by an average of +8.5% AUROC.
Attention patterns prior to answer generation carry significant predictive signals.
The method generalizes better to out-of-domain data compared to traditional approaches.
Evaluation spans multiple instruction-tuned LLMs and diverse QA datasets.

Computer Science > Machine Learning arXiv:2602.13699 (cs) [Submitted on 14 Feb 2026] Title:Attention Head Entropy of LLMs Predicts Answer Correctness Authors:Sophie Ostmeier, Brian Axelrod, Maya Varma, Asad Aali, Yabin Zhang, Magdalini Paschali, Sanmi Koyejo, Curtis Langlotz, Akshay Chaudhari View a PDF of the paper titled Attention Head Entropy of LLMs Predicts Answer Correctness, by Sophie Ostmeier and 8 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) often generate plausible yet incorrect answers, posing risks in safety-critical settings such as medicine. Human evaluation is expensive, and LLM-as-judge approaches risk introducing hidden errors. Recent white-box methods detect contextual hallucinations using model internals, focusing on the localization of the attention mass, but two questions remain open: do these approaches extend to predicting answer correctness, and do they generalize out-of-domains? We introduce Head Entropy, a method that predicts answer correctness from attention entropy patterns, specifically measuring the spread of the attention mass. Using sparse logistic regression on per-head 2-Renyi entropies, Head Entropy matches or exceeds baselines in-distribution and generalizes substantially better on out-of-domains, it outperforms the closest baseline on average by +8.5% AUROC. We further show that attention patterns over the question/context alone, before answer generation, already carry predictive signal using Head En...

Read Original Article

[2602.13699] Attention Head Entropy of LLMs Predicts Answer Correctness

Summary

Why It Matters

Key Takeaways

Related Articles

[R] Looking for arXiv cs.LG endorser, inference monitoring using information geometry

How LLM sycophancy got the US into the Iran quagmire

Kept hitting ChatGPT and Claude limits during real work. This is the free setup I ended up using

Is ChatGPT changing the way we think too much already?

No comments

Stay updated with AI News