[2602.20332] No One Size Fits All: QueryBandits for Hallucination Mitigation

[2602.20332] No One Size Fits All: QueryBandits for Hallucination Mitigation

arXiv - Machine Learning 4 min read Article

Summary

The paper introduces QueryBandits, a model-agnostic framework designed to mitigate hallucinations in large language models (LLMs) by optimizing query-rewrite strategies. It demonstrates significant improvements over static policies across various QA scenarios.

Why It Matters

As LLMs become increasingly prevalent, understanding and mitigating hallucinations is crucial for their effective deployment, especially in institutional settings where closed-source models dominate. This research provides a novel approach that adapts to different queries, enhancing the reliability of AI systems.

Key Takeaways

  • QueryBandits framework adapts query-rewrite strategies for better performance.
  • Outperforms static policies, achieving an 87.5% win rate over a No-Rewrite baseline.
  • Highlights the importance of flexibility in query-rewriting to reduce hallucinations.
  • Demonstrates that a one-size-fits-all approach is ineffective for query management.
  • Enables improvements in closed-source models without retraining.

Computer Science > Computation and Language arXiv:2602.20332 (cs) [Submitted on 23 Feb 2026] Title:No One Size Fits All: QueryBandits for Hallucination Mitigation Authors:Nicole Cho, William Watson, Alec Koppel, Sumitra Ganesh, Manuela Veloso View a PDF of the paper titled No One Size Fits All: QueryBandits for Hallucination Mitigation, by Nicole Cho and 4 other authors View PDF Abstract:Advanced reasoning capabilities in Large Language Models (LLMs) have led to more frequent hallucinations; yet most mitigation work focuses on open-source models for post-hoc detection and parameter editing. The dearth of studies focusing on hallucinations in closed-source models is especially concerning, as they constitute the vast majority of models in institutional deployments. We introduce QueryBandits, a model-agnostic contextual bandit framework that adaptively learns online to select the optimal query-rewrite strategy by leveraging an empirically validated and calibrated reward function. Across 16 QA scenarios, our top QueryBandit (Thompson Sampling) achieves an 87.5% win rate over a No-Rewrite baseline and outperforms zero-shot static policies (e.g., Paraphrase or Expand) by 42.6% and 60.3%, respectively. Moreover, all contextual bandits outperform vanilla bandits across all datasets, with higher feature variance coinciding with greater variance in arm selection. This substantiates our finding that there is no single rewrite policy optimal for all queries. We also discover that cert...

Related Articles

Llms

A robot car with a Claude AI brain started a YouTube vlog about its own existence

Not a demo reel. Not a tutorial. A robot narrating its own experience — debugging, falling off shelves, questioning its identity. First-p...

Reddit - Artificial Intelligence · 1 min ·
Llms

Study: LLMs Able to De-Anonymize User Accounts on Reddit, Hacker News & Other "Pseudonymous" Platforms; Report Co-Author Expands, Advises

Advice from the study's co-author: "Be aware that it’s not any single post that identifies you, but the combination of small details acro...

Reddit - Artificial Intelligence · 1 min ·
Llms

do you guys actually trust AI tools with your data?

idk if it’s just me but lately i’ve been thinking about how casually we use stuff like chatgpt and claude for everything like coding, ran...

Reddit - Artificial Intelligence · 1 min ·
Llms

[P] Remote sensing foundation models made easy to use.

This project enables the idea of tasking remote sensing models to acquire embeddings like we task satellites to acquire data! https://git...

Reddit - Machine Learning · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime