[2602.18763] TAG: Thinking with Action Unit Grounding for Facial Expression Recognition

[2602.18763] TAG: Thinking with Action Unit Grounding for Facial Expression Recognition

arXiv - AI 4 min read Article

Summary

The paper introduces TAG, a vision-language framework for Facial Expression Recognition (FER) that enhances reasoning by grounding predictions in facial Action Units, improving robustness and reducing hallucinations in outputs.

Why It Matters

This research addresses the limitations of current vision-language models in FER by ensuring that predictions are supported by verifiable visual evidence. By grounding reasoning in facial Action Units, TAG enhances the reliability of FER systems, which is crucial for applications in emotion analysis, human-computer interaction, and AI safety.

Key Takeaways

  • TAG improves Facial Expression Recognition by grounding predictions in facial Action Units.
  • The model reduces hallucinations and enhances visual faithfulness in outputs.
  • It outperforms existing vision-language model baselines on multiple datasets.
  • Intermediate reasoning steps are crucial for trustworthy multimodal reasoning.
  • The approach demonstrates the importance of structured representations in AI.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.18763 (cs) [Submitted on 21 Feb 2026] Title:TAG: Thinking with Action Unit Grounding for Facial Expression Recognition Authors:Haobo Lin, Tianyi Bai, Jiajun Zhang, Xuanhao Chang, Sheng Lu, Fangming Gu, Zengjie Hu, Wentao Zhang View a PDF of the paper titled TAG: Thinking with Action Unit Grounding for Facial Expression Recognition, by Haobo Lin and 7 other authors View PDF Abstract:Facial Expression Recognition (FER) is a fine-grained visual understanding task where reliable predictions require reasoning over localized and meaningful facial cues. Recent vision--language models (VLMs) enable natural language explanations for FER, but their reasoning is often ungrounded, producing fluent yet unverifiable rationales that are weakly tied to visual evidence and prone to hallucination, leading to poor robustness across different datasets. We propose TAG (Thinking with Action Unit Grounding), a vision--language framework that explicitly constrains multimodal reasoning to be supported by facial Action Units (AUs). TAG requires intermediate reasoning steps to be grounded in AU-related facial regions, yielding predictions accompanied by verifiable visual evidence. The model is trained via supervised fine-tuning on AU-grounded reasoning traces followed by reinforcement learning with an AU-aware reward that aligns predicted regions with external AU detectors. Evaluated on RAF-DB, FERPlus, and AffectNet, TAG consiste...

Related Articles

Llms

TRACER: Learn-to-Defer for LLM Classification with Formal Teacher-Agreement Guarantees

I'm releasing TRACER (Trace-Based Adaptive Cost-Efficient Routing), a library for learning cost-efficient routing policies from LLM trace...

Reddit - Machine Learning · 1 min ·
Mistral AI raises $830M in debt to set up a data center near Paris | TechCrunch
Llms

Mistral AI raises $830M in debt to set up a data center near Paris | TechCrunch

Mistral aims to start operating the data center by the second quarter of 2026.

TechCrunch - AI · 4 min ·
Llms

The Rationing: AI companies are using the "subsidize, addict, extract" playbook — and developers are the product

Anthropic just ran the classic platform playbook on developers: offer generous limits to build dependency, then tighten the screws once t...

Reddit - Artificial Intelligence · 1 min ·
Llms

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Google AI (gai.google) gives Gemini-powered answers for technical queries — think AI-enhanced search with code understanding. I built a C...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime