[2603.26829] Squish and Release: Exposing Hidden Hallucinations by

[2603.26829] Squish and Release: Exposing Hidden Hallucinations by Making Them Surface as Safety Signals

arXiv - AI March 31, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.26829: Squish and Release: Exposing Hidden Hallucinations by Making Them Surface as Safety Signals

Computer Science > Machine Learning arXiv:2603.26829 (cs) [Submitted on 27 Mar 2026] Title:Squish and Release: Exposing Hidden Hallucinations by Making Them Surface as Safety Signals Authors:Nathaniel Oh, Paul Attie View a PDF of the paper titled Squish and Release: Exposing Hidden Hallucinations by Making Them Surface as Safety Signals, by Nathaniel Oh and Paul Attie View PDF HTML (experimental) Abstract:Language models detect false premises when asked directly but absorb them under conversational pressure, producing authoritative professional output built on errors they already identified. This failure - order-gap hallucination - is invisible to output inspection because the error migrates into the activation space of the safety circuit, suppressed but not erased. We introduce Squish and Release (S&R), an activation-patching architecture with two components: a fixed detector body (layers 24-31, the localized safety evaluation circuit) and a swappable detector core (an activation vector controlling perception direction). A safety core shifts the model from compliance toward detection; an absorb core reverses it. We evaluate on OLMo-2 7B using the Order-Gap Benchmark - 500 chains across 500 domains, all manually graded. Key findings: cascade collapse is near-total (99.8% compliance at O5); the detector body is binary and localized (layers 24-31 shift 93.6%, layers 0-23 contribute zero, p<10^-189); a synthetically engineered core releases 76.6% of collapsed chains; detectio...

Originally published on March 31, 2026. Curated by AI News.

Llms

[2603.23966] Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

Abstract page for arXiv paper 2603.23966: Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

arXiv - AI · 4 min · about 1 hour ago

Llms

[2603.16790] InCoder-32B: Code Foundation Model for Industrial Scenarios

Abstract page for arXiv paper 2603.16790: InCoder-32B: Code Foundation Model for Industrial Scenarios

arXiv - AI · 4 min · about 1 hour ago

Llms

[2603.16430] EngGPT2: Sovereign, Efficient and Open Intelligence

Abstract page for arXiv paper 2603.16430: EngGPT2: Sovereign, Efficient and Open Intelligence

arXiv - AI · 4 min · about 1 hour ago

Llms

[2603.11066] Exploring Collatz Dynamics with Human-LLM Collaboration

Abstract page for arXiv paper 2603.11066: Exploring Collatz Dynamics with Human-LLM Collaboration

arXiv - AI · 4 min · about 1 hour ago

[2603.26829] Squish and Release: Exposing Hidden Hallucinations by Making Them Surface as Safety Signals

About this article

Related Articles

[2603.23966] Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

[2603.16790] InCoder-32B: Code Foundation Model for Industrial Scenarios

[2603.16430] EngGPT2: Sovereign, Efficient and Open Intelligence

[2603.11066] Exploring Collatz Dynamics with Human-LLM Collaboration

No comments

Stay updated with AI News