[2601.19922] HEART: A Unified Benchmark for Assessing Humans and LLMs in Emotional Support Dialogue

[2601.19922] HEART: A Unified Benchmark for Assessing Humans and LLMs in Emotional Support Dialogue

arXiv - AI 4 min read Article

Summary

The paper introduces HEART, a benchmark for evaluating emotional support dialogue in humans and LLMs, focusing on empathy and communication skills.

Why It Matters

HEART addresses the gap in assessing emotional support capabilities of LLMs compared to humans. By providing a standardized framework, it enhances understanding of AI's role in supportive conversations, which is crucial as AI systems increasingly interact with users in sensitive contexts.

Key Takeaways

  • HEART is the first framework to compare human and LLM responses in emotional dialogues.
  • It evaluates interactions based on five dimensions of interpersonal communication.
  • Some LLMs show comparable empathy to humans, while humans excel in nuanced emotional responses.
  • The study reveals a convergence in assessment criteria between human and LLM evaluators.
  • HEART provides a foundation for future research on emotional competence in AI.

Computer Science > Computation and Language arXiv:2601.19922 (cs) [Submitted on 9 Jan 2026 (v1), last revised 25 Feb 2026 (this version, v2)] Title:HEART: A Unified Benchmark for Assessing Humans and LLMs in Emotional Support Dialogue Authors:Laya Iyer, Kriti Aggarwal, Sanmi Koyejo, Gail Heyman, Desmond C. Ong, Subhabrata Mukherjee View a PDF of the paper titled HEART: A Unified Benchmark for Assessing Humans and LLMs in Emotional Support Dialogue, by Laya Iyer and Kriti Aggarwal and 4 other authors View PDF HTML (experimental) Abstract:Supportive conversation depends on skills that go beyond language fluency, including reading emotions, adjusting tone, and navigating moments of resistance, frustration, or distress. Despite rapid progress in language models, we still lack a clear way to understand how their abilities in these interpersonal domains compare to those of humans. We introduce HEART, the first-ever framework that directly compares humans and LLMs on the same multi-turn emotional-support conversations. For each dialogue history, we pair human and model responses and evaluate them through blinded human raters and an ensemble of LLM-as-judge evaluators. All assessments follow a rubric grounded in interpersonal communication science across five dimensions: Human Alignment, Empathic Responsiveness, Attunement, Resonance, and Task-Following. HEART uncovers striking behavioral patterns. Several frontier models approach or surpass the average human responses in perceive...

Related Articles

[2603.17839] How do LLMs Compute Verbal Confidence
Llms

[2603.17839] How do LLMs Compute Verbal Confidence

Abstract page for arXiv paper 2603.17839: How do LLMs Compute Verbal Confidence

arXiv - AI · 4 min ·
[2603.15970] 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models
Llms

[2603.15970] 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

Abstract page for arXiv paper 2603.15970: 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight...

arXiv - AI · 4 min ·
[2603.10062] Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead
Llms

[2603.10062] Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

Abstract page for arXiv paper 2603.10062: Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

arXiv - AI · 3 min ·
[2603.09085] Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting
Llms

[2603.09085] Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting

Abstract page for arXiv paper 2603.09085: Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum ...

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime