Llms Machine Learning Ai Safety Ai Infrastructure Data Science

[2602.18895] Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models?

arXiv - Machine Learning February 24, 2026 4 min read Article

Summary

This paper explores the potential of large language models (LLMs) as post-hoc explainability tools in credit risk models, evaluating their effectiveness in translating complex model outputs for non-technical stakeholders.

Why It Matters

With the increasing complexity of credit risk models, effective communication of model outputs to stakeholders is crucial. This research highlights how LLMs can serve as a bridge between technical data and stakeholder understanding, enhancing transparency and governance in financial decision-making.

Key Takeaways

LLMs can effectively translate complex model outputs for non-technical stakeholders.
Few-shot prompting improves feature overlap in some models but not consistently across all types.
LLMs should be viewed as narrative interfaces rather than replacements for traditional explainability tools.

Quantitative Finance > Risk Management arXiv:2602.18895 (q-fin) [Submitted on 21 Feb 2026] Title:Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models? Authors:Wenxi Geng, Dingyuan Liu, Liya Li, Yiqing Wang View a PDF of the paper titled Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models?, by Wenxi Geng and 3 other authors View PDF HTML (experimental) Abstract:Post-hoc explainability is central to credit risk model governance, yet widely used tools such as coefficient-based attributions and SHapley Additive exPlanations (SHAP) often produce numerical outputs that are difficult to communicate to non-technical stakeholders. This paper investigates whether large language models (LLMs) can serve as post-hoc explainability tools for credit risk predictions through in-context learning, focusing on two roles: translators and autonomous explainers. Using a personal lending dataset from LendingClub, we evaluate three commercial LLMs, including GPT-4-turbo, Claude Sonnet 4, and Gemini-2.0-Flash. Results provide strong evidence for the translator role. In contrast, autonomous explanations show low alignment with model-based attributions. Few-shot prompting improves feature overlap for logistic regression but does not consistently benefit XGBoost, suggesting that LLMs have limited capacity to recover non-linear, interaction-driven reasoning from prompt cues alone. Our findings position LLMs as effective narrative i...

Read Original Article

[2602.18895] Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models?

Summary

Why It Matters

Key Takeaways

Related Articles

People anxious about deviating from what AI tells them to do?

What if Claude purposefully made its own code leakable so that it would get leaked

Observer-Embedded Reality

I think we’re about to have a new kind of “SEO”… and nobody is talking about it.

No comments

Stay updated with AI News