[2602.20300] What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance
Summary
This article examines how specific linguistic features of queries impact the performance of Large Language Models (LLMs), particularly in relation to hallucinations.
Why It Matters
Understanding the relationship between query structure and LLM performance is crucial for improving AI systems. By identifying features that lead to hallucinations, developers can create better query frameworks, enhancing the reliability of AI outputs and user interactions.
Key Takeaways
- Certain linguistic features increase the likelihood of LLM hallucinations.
- Deep clause nesting and underspecification are linked to higher hallucination rates.
- Clear intention grounding and answerability reduce hallucination risks.
- Findings can inform query rewriting strategies for better AI performance.
- The study provides a framework for future research on query features.
Computer Science > Computation and Language arXiv:2602.20300 (cs) [Submitted on 23 Feb 2026] Title:What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance Authors:William Watson, Nicole Cho, Sumitra Ganesh, Manuela Veloso View a PDF of the paper titled What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance, by William Watson and 3 other authors View PDF HTML (experimental) Abstract:Large Language Model (LLM) hallucinations are usually treated as defects of the model or its decoding strategy. Drawing on classical linguistics, we argue that a query's form can also shape a listener's (and model's) response. We operationalize this insight by constructing a 22-dimension query feature vector covering clause complexity, lexical rarity, and anaphora, negation, answerability, and intention grounding, all known to affect human comprehension. Using 369,837 real-world queries, we ask: Are there certain types of queries that make hallucination more likely? A large-scale analysis reveals a consistent "risk landscape": certain features such as deep clause nesting and underspecification align with higher hallucination propensity. In contrast, clear intention grounding and answerability align with lower hallucination rates. Others, including domain specificity, show mixed, dataset- and model-dependent effects. Thus, these findings establish an empirically observable query-feature representation...