[2602.20332] No One Size Fits All: QueryBandits for Hallucination Mitigation
Summary
The paper introduces QueryBandits, a model-agnostic framework designed to mitigate hallucinations in large language models (LLMs) by optimizing query-rewrite strategies. It demonstrates significant improvements over static policies across various QA scenarios.
Why It Matters
As LLMs become increasingly prevalent, understanding and mitigating hallucinations is crucial for their effective deployment, especially in institutional settings where closed-source models dominate. This research provides a novel approach that adapts to different queries, enhancing the reliability of AI systems.
Key Takeaways
- QueryBandits framework adapts query-rewrite strategies for better performance.
- Outperforms static policies, achieving an 87.5% win rate over a No-Rewrite baseline.
- Highlights the importance of flexibility in query-rewriting to reduce hallucinations.
- Demonstrates that a one-size-fits-all approach is ineffective for query management.
- Enables improvements in closed-source models without retraining.
Computer Science > Computation and Language arXiv:2602.20332 (cs) [Submitted on 23 Feb 2026] Title:No One Size Fits All: QueryBandits for Hallucination Mitigation Authors:Nicole Cho, William Watson, Alec Koppel, Sumitra Ganesh, Manuela Veloso View a PDF of the paper titled No One Size Fits All: QueryBandits for Hallucination Mitigation, by Nicole Cho and 4 other authors View PDF Abstract:Advanced reasoning capabilities in Large Language Models (LLMs) have led to more frequent hallucinations; yet most mitigation work focuses on open-source models for post-hoc detection and parameter editing. The dearth of studies focusing on hallucinations in closed-source models is especially concerning, as they constitute the vast majority of models in institutional deployments. We introduce QueryBandits, a model-agnostic contextual bandit framework that adaptively learns online to select the optimal query-rewrite strategy by leveraging an empirically validated and calibrated reward function. Across 16 QA scenarios, our top QueryBandit (Thompson Sampling) achieves an 87.5% win rate over a No-Rewrite baseline and outperforms zero-shot static policies (e.g., Paraphrase or Expand) by 42.6% and 60.3%, respectively. Moreover, all contextual bandits outperform vanilla bandits across all datasets, with higher feature variance coinciding with greater variance in arm selection. This substantiates our finding that there is no single rewrite policy optimal for all queries. We also discover that cert...