[2602.16608] Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models
Summary
The paper presents the Context-Aware Layer-Wise Integrated Gradients (CA-LIG) framework, enhancing explainability in Transformer models by providing context-sensitive attributions across layers, improving interpretability in various tasks.
Why It Matters
As Transformer models become increasingly prevalent in AI applications, understanding their decision-making processes is crucial. The CA-LIG framework addresses existing limitations in explainability methods, offering a more nuanced and context-aware approach that can improve trust and transparency in AI systems.
Key Takeaways
- CA-LIG integrates layer-wise attributions with attention gradients for better interpretability.
- The framework captures context-sensitive dependencies, enhancing the understanding of model decisions.
- Evaluated across multiple tasks, CA-LIG outperforms traditional explainability methods.
- Provides clearer visualizations of model attributions, aiding in practical applications.
- Advances the field of explainable AI by addressing key shortcomings in existing techniques.
Computer Science > Computation and Language arXiv:2602.16608 (cs) [Submitted on 18 Feb 2026] Title:Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models Authors:Melkamu Abay Mersha, Jugal Kalita View a PDF of the paper titled Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models, by Melkamu Abay Mersha and Jugal Kalita View PDF HTML (experimental) Abstract:Transformer models achieve state-of-the-art performance across domains and tasks, yet their deeply layered representations make their predictions difficult to interpret. Existing explainability methods rely on final-layer attributions, capture either local token-level attributions or global attention patterns without unification, and lack context-awareness of inter-token dependencies and structural components. They also fail to capture how relevance evolves across layers and how structural components shape decision-making. To address these limitations, we proposed the \textbf{Context-Aware Layer-wise Integrated Gradients (CA-LIG) Framework}, a unified hierarchical attribution framework that computes layer-wise Integrated Gradients within each Transformer block and fuses these token-level attributions with class-specific attention gradients. This integration yields signed, context-sensitive attribution maps that capture supportive and opposing evidence while tracing the hierarchical flow of relevance through the Transformer layers. We eval...