[2603.04454] Query Disambiguation via Answer-Free Context: Doubling Performance on Humanity's Last Exam

[2603.04454] Query Disambiguation via Answer-Free Context: Doubling Performance on Humanity's Last Exam

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.04454: Query Disambiguation via Answer-Free Context: Doubling Performance on Humanity's Last Exam

Computer Science > Computation and Language arXiv:2603.04454 (cs) [Submitted on 27 Feb 2026] Title:Query Disambiguation via Answer-Free Context: Doubling Performance on Humanity's Last Exam Authors:Michael Majurski, Cynthia Matuszek View a PDF of the paper titled Query Disambiguation via Answer-Free Context: Doubling Performance on Humanity's Last Exam, by Michael Majurski and Cynthia Matuszek View PDF HTML (experimental) Abstract:How carefully and unambiguously a question is phrased has a profound impact on the quality of the response, for Language Models (LMs) as well as people. While model capabilities continue to advance, the interplay between grounding context and query formulation remains under-explored. This work investigates how the quality of background grounding information in a model's context window affects accuracy. We find that combining well-grounded dynamic context construction (i.e, RAG) with query rewriting reduces question ambiguity, resulting in significant accuracy gains. Given a user question with associated answer-free grounding context, rewriting the question to reduce ambiguity produces benchmark improvements without changing the answer itself, even compared to prepending that context before the question. Using \texttt{gpt-oss-20b} to rewrite a subset of Humanity's Last Exam using answer-free grounding context improves \texttt{gpt-5-mini} accuracy from 0.14 to 0.37. We demonstrate that this accuracy improvement cannot be fully recovered just throug...

Originally published on March 06, 2026. Curated by AI News.

Related Articles

Llms

One of The Worst AI's I've Ever Seen

I'm using Gemini just for they gave us a student-free-pro pack. It can't see the images I sent, most of the time it just rewrites the mes...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone 👋 I've set up a self-hosted API gateway using New-API to manage and distribute Claude Opus 4.6 access across multiple users....

Reddit - Artificial Intelligence · 1 min ·
Llms

The open-source AI system that beat Claude Sonnet on a $500 GPU just shipped a coding assistant

A week or two ago, an open-source project called ATLAS made the rounds for scoring 74.6% on LiveCodeBench with a frozen 9B model on a sin...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude Max 20x usage hit 40% by Monday noon — how does Codex CLI compare?

I'm on Claude Max (the $100/mo plan) and noticed something that surprised me. By Monday noon I had already used 40% of the 20x monthly li...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime