[2602.12714] ADEPT: RL-Aligned Agentic Decoding of Emotion via Evidence Probing Tools -- From Consensus Learning to Ambiguity-Driven Emotion Reasoning

[2602.12714] ADEPT: RL-Aligned Agentic Decoding of Emotion via Evidence Probing Tools -- From Consensus Learning to Ambiguity-Driven Emotion Reasoning

arXiv - Machine Learning 4 min read Article

Summary

The paper introduces ADEPT, a novel framework for emotion recognition that enhances accuracy by integrating acoustic evidence and multi-turn inquiry processes, shifting from consensus learning to ambiguity-driven reasoning.

Why It Matters

This research addresses the limitations of current emotion recognition models by proposing a method that combines self-supervised speech encoders with structured inquiry, improving the interpretability and accuracy of emotional assessments. This is particularly relevant as emotion recognition technology becomes increasingly integrated into AI applications.

Key Takeaways

  • ADEPT redefines emotion recognition as a multi-turn inquiry process.
  • The framework improves accuracy for both primary and minor emotions.
  • It integrates acoustic and semantic probing tools for evidence-based reasoning.
  • Minority annotations are treated as valuable signals rather than noise.
  • Group Relative Policy Optimization enhances prediction quality by coupling tool usage with evidence.

Computer Science > Machine Learning arXiv:2602.12714 (cs) [Submitted on 13 Feb 2026] Title:ADEPT: RL-Aligned Agentic Decoding of Emotion via Evidence Probing Tools -- From Consensus Learning to Ambiguity-Driven Emotion Reasoning Authors:Esther Sun, Bo-Hao Su, Abinay Reddy Naini, Shinji Watanabe, Carlos Busso View a PDF of the paper titled ADEPT: RL-Aligned Agentic Decoding of Emotion via Evidence Probing Tools -- From Consensus Learning to Ambiguity-Driven Emotion Reasoning, by Esther Sun and 4 other authors View PDF HTML (experimental) Abstract:Speech Large Language Models (SLLMs) enable high-level emotion reasoning but often produce ungrounded, text-biased judgments without verifiable acoustic evidence. In contrast, self-supervised speech encoders such as WavLM provide strong acoustic representations yet remain opaque discriminative models with limited interpretability. To bridge this gap, we introduce ADEPT (Agentic Decoding of Emotion via Evidence Probing Tools), a framework that reframes emotion recognition as a multi-turn inquiry process rather than a single-pass prediction. ADEPT transforms an SLLM into an agent that maintains an evolving candidate emotion set and adaptively invokes dedicated semantic and acoustic probing tools within a structured pipeline of candidate generation, evidence collection, and adjudication. Crucially, ADEPT enables a paradigm shift from consensus learning to ambiguity-driven emotion reasoning. Since human affect exhibits inherent complex...

Related Articles

Llms

Attention Is All You Need, But All You Can't Afford | Hybrid Attention

Repo: https://codeberg.org/JohannaJuntos/Sisyphus I've been building a small Rust-focused language model from scratch in PyTorch. Not a f...

Reddit - Artificial Intelligence · 1 min ·
The “Agony” or ChatGPT: Would You Let AI Write Your Wedding Speech?
Llms

The “Agony” or ChatGPT: Would You Let AI Write Your Wedding Speech?

AI Tools & Products · 12 min ·
Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute
Llms

Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute

AI Tools & Products · 3 min ·
How I use Claude for strategy, Gemini for research and ChatGPT for 'the grind'
Llms

How I use Claude for strategy, Gemini for research and ChatGPT for 'the grind'

AI Tools & Products · 9 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime