[2604.06195] Hallucination as output-boundary misclassification: a composite abstention architecture for language models
About this article
Abstract page for arXiv paper 2604.06195: Hallucination as output-boundary misclassification: a composite abstention architecture for language models
Computer Science > Computation and Language arXiv:2604.06195 (cs) [Submitted on 12 Mar 2026] Title:Hallucination as output-boundary misclassification: a composite abstention architecture for language models Authors:Angelina Hintsanen View a PDF of the paper titled Hallucination as output-boundary misclassification: a composite abstention architecture for language models, by Angelina Hintsanen View PDF HTML (experimental) Abstract:Large language models often produce unsupported claims. We frame this as a misclassification error at the output boundary, where internally generated completions are emitted as if they were grounded in evidence. This motivates a composite intervention that combines instruction-based refusal with a structural abstention gate. The gate computes a support deficit score, St, from three black-box signals: self-consistency (At), paraphrase stability (Pt), and citation coverage (Ct), and blocks output when St exceeds a threshold. In a controlled evaluation across 50 items, five epistemic regimes, and three models, neither mechanism alone was sufficient. Instruction-only prompting reduced hallucination sharply, but still showed over-cautious abstention on answerable items and residual hallucination for GPT-3.5-turbo. The structural gate preserved answerable accuracy across models but missed confident confabulation on conflicting-evidence items. The composite architecture achieved high overall accuracy with low hallucination, while also inheriting some ove...