[2602.13699] Attention Head Entropy of LLMs Predicts Answer Correctness

[2602.13699] Attention Head Entropy of LLMs Predicts Answer Correctness

arXiv - Machine Learning 3 min read Article

Summary

This paper introduces Head Entropy, a method for predicting answer correctness in large language models (LLMs) by analyzing attention entropy patterns, demonstrating improved performance over existing methods.

Why It Matters

As LLMs are increasingly used in critical applications, ensuring their reliability is vital. This research provides a novel approach to assess answer correctness, potentially reducing risks associated with incorrect outputs in high-stakes environments like healthcare.

Key Takeaways

  • Head Entropy effectively predicts answer correctness using attention patterns.
  • It outperforms existing methods by an average of +8.5% AUROC.
  • Attention patterns prior to answer generation carry significant predictive signals.
  • The method generalizes better to out-of-domain data compared to traditional approaches.
  • Evaluation spans multiple instruction-tuned LLMs and diverse QA datasets.

Computer Science > Machine Learning arXiv:2602.13699 (cs) [Submitted on 14 Feb 2026] Title:Attention Head Entropy of LLMs Predicts Answer Correctness Authors:Sophie Ostmeier, Brian Axelrod, Maya Varma, Asad Aali, Yabin Zhang, Magdalini Paschali, Sanmi Koyejo, Curtis Langlotz, Akshay Chaudhari View a PDF of the paper titled Attention Head Entropy of LLMs Predicts Answer Correctness, by Sophie Ostmeier and 8 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) often generate plausible yet incorrect answers, posing risks in safety-critical settings such as medicine. Human evaluation is expensive, and LLM-as-judge approaches risk introducing hidden errors. Recent white-box methods detect contextual hallucinations using model internals, focusing on the localization of the attention mass, but two questions remain open: do these approaches extend to predicting answer correctness, and do they generalize out-of-domains? We introduce Head Entropy, a method that predicts answer correctness from attention entropy patterns, specifically measuring the spread of the attention mass. Using sparse logistic regression on per-head 2-Renyi entropies, Head Entropy matches or exceeds baselines in-distribution and generalizes substantially better on out-of-domains, it outperforms the closest baseline on average by +8.5% AUROC. We further show that attention patterns over the question/context alone, before answer generation, already carry predictive signal using Head En...

Related Articles

Llms

[R] Looking for arXiv cs.LG endorser, inference monitoring using information geometry

Hi r/MachineLearning, I’m looking for an arXiv endorser in cs.LG for a paper on inference-time distribution shift detection for deployed ...

Reddit - Machine Learning · 1 min ·
Llms

How LLM sycophancy got the US into the Iran quagmire

submitted by /u/sow_oats [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Llms

Kept hitting ChatGPT and Claude limits during real work. This is the free setup I ended up using

I do a lot of writing and random problem solving for work. Mostly long drafts, edits, and breaking down ideas. Around Jan I kept hitting ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Is ChatGPT changing the way we think too much already?

Back in the day, I got ChatGPT Plus mostly for work and to help me write better and do stuff faster. But now I use it for almost everythi...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime