[2602.10117] Biases in the Blind Spot: Detecting What LLMs Fail to Mention

[2602.10117] Biases in the Blind Spot: Detecting What LLMs Fail to Mention

arXiv - Machine Learning 4 min read Article

Summary

The paper discusses a novel automated pipeline for detecting unverbalized biases in Large Language Models (LLMs), highlighting its effectiveness across various decision-making tasks.

Why It Matters

Understanding and mitigating biases in LLMs is crucial for ensuring fairness in AI applications. This research introduces a scalable method for identifying biases that may not be explicitly stated, which can enhance the reliability of AI systems in sensitive areas like hiring and admissions.

Key Takeaways

  • The proposed pipeline automates the detection of unverbalized biases in LLMs.
  • It evaluates biases across multiple decision tasks, including hiring and loan approvals.
  • The method identifies both new and previously known biases, improving bias evaluation processes.
  • Statistical techniques are employed for rigorous testing of bias concepts.
  • This research contributes to the broader discourse on AI fairness and accountability.

Computer Science > Machine Learning arXiv:2602.10117 (cs) [Submitted on 10 Feb 2026 (v1), last revised 19 Feb 2026 (this version, v3)] Title:Biases in the Blind Spot: Detecting What LLMs Fail to Mention Authors:Iván Arcuschin, David Chanin, Adrià Garriga-Alonso, Oana-Maria Camburu View a PDF of the paper titled Biases in the Blind Spot: Detecting What LLMs Fail to Mention, by Iv\'an Arcuschin and 3 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) often provide chain-of-thought (CoT) reasoning traces that appear plausible, but may hide internal biases. We call these *unverbalized biases*. Monitoring models via their stated reasoning is therefore unreliable, and existing bias evaluations typically require predefined categories and hand-crafted datasets. In this work, we introduce a fully automated, black-box pipeline for detecting task-specific unverbalized biases. Given a task dataset, the pipeline uses LLM autoraters to generate candidate bias concepts. It then tests each concept on progressively larger input samples by generating positive and negative variations, and applies statistical techniques for multiple testing and early stopping. A concept is flagged as an unverbalized bias if it yields statistically significant performance differences while not being cited as justification in the model's CoTs. We evaluate our pipeline across seven LLMs on three decision tasks (hiring, loan approval, and university admissions). Our technique automat...

Related Articles

Llms

A robot car with a Claude AI brain started a YouTube vlog about its own existence

Not a demo reel. Not a tutorial. A robot narrating its own experience — debugging, falling off shelves, questioning its identity. First-p...

Reddit - Artificial Intelligence · 1 min ·
Llms

Study: LLMs Able to De-Anonymize User Accounts on Reddit, Hacker News & Other "Pseudonymous" Platforms; Report Co-Author Expands, Advises

Advice from the study's co-author: "Be aware that it’s not any single post that identifies you, but the combination of small details acro...

Reddit - Artificial Intelligence · 1 min ·
Llms

do you guys actually trust AI tools with your data?

idk if it’s just me but lately i’ve been thinking about how casually we use stuff like chatgpt and claude for everything like coding, ran...

Reddit - Artificial Intelligence · 1 min ·
Llms

[P] Remote sensing foundation models made easy to use.

This project enables the idea of tasking remote sensing models to acquire embeddings like we task satellites to acquire data! https://git...

Reddit - Machine Learning · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime