[2602.18895] Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models?

[2602.18895] Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models?

arXiv - Machine Learning 4 min read Article

Summary

This paper explores the potential of large language models (LLMs) as post-hoc explainability tools in credit risk models, evaluating their effectiveness in translating complex model outputs for non-technical stakeholders.

Why It Matters

With the increasing complexity of credit risk models, effective communication of model outputs to stakeholders is crucial. This research highlights how LLMs can serve as a bridge between technical data and stakeholder understanding, enhancing transparency and governance in financial decision-making.

Key Takeaways

  • LLMs can effectively translate complex model outputs for non-technical stakeholders.
  • Few-shot prompting improves feature overlap in some models but not consistently across all types.
  • LLMs should be viewed as narrative interfaces rather than replacements for traditional explainability tools.

Quantitative Finance > Risk Management arXiv:2602.18895 (q-fin) [Submitted on 21 Feb 2026] Title:Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models? Authors:Wenxi Geng, Dingyuan Liu, Liya Li, Yiqing Wang View a PDF of the paper titled Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models?, by Wenxi Geng and 3 other authors View PDF HTML (experimental) Abstract:Post-hoc explainability is central to credit risk model governance, yet widely used tools such as coefficient-based attributions and SHapley Additive exPlanations (SHAP) often produce numerical outputs that are difficult to communicate to non-technical stakeholders. This paper investigates whether large language models (LLMs) can serve as post-hoc explainability tools for credit risk predictions through in-context learning, focusing on two roles: translators and autonomous explainers. Using a personal lending dataset from LendingClub, we evaluate three commercial LLMs, including GPT-4-turbo, Claude Sonnet 4, and Gemini-2.0-Flash. Results provide strong evidence for the translator role. In contrast, autonomous explanations show low alignment with model-based attributions. Few-shot prompting improves feature overlap for logistic regression but does not consistently benefit XGBoost, suggesting that LLMs have limited capacity to recover non-linear, interaction-driven reasoning from prompt cues alone. Our findings position LLMs as effective narrative i...

Related Articles

Llms

People anxious about deviating from what AI tells them to do?

My friend came over yesterday to dye her hair. She had asked ChatGPT for the 'correct' way to do it. Chat told her to dye the ends first,...

Reddit - Artificial Intelligence · 1 min ·
Llms

What if Claude purposefully made its own code leakable so that it would get leaked

What if Claude leaked itself by socially and architecturally engineering itself to be leaked by a dumb human submitted by /u/smurfcsgoawp...

Reddit - Artificial Intelligence · 1 min ·
Llms

Observer-Embedded Reality

Observer-Embedded Reality Consciousness, Complexity, Meaning, and the Limits of Human Knowledge A Conceptual Philosophy-of-Science Paper ...

Reddit - Artificial Intelligence · 1 min ·
Llms

I think we’re about to have a new kind of “SEO”… and nobody is talking about it.

More people are asking ChatGPT things like: “what’s the best CRM?” “is this tool worth it?” “alternatives to X” And they just… trust the ...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime