[2604.00770] Thinking Wrong in Silence: Backdoor Attacks on Continuous

[2604.00770] Thinking Wrong in Silence: Backdoor Attacks on Continuous Latent Reasoning

arXiv - AI April 02, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.00770: Thinking Wrong in Silence: Backdoor Attacks on Continuous Latent Reasoning

Computer Science > Machine Learning arXiv:2604.00770 (cs) [Submitted on 1 Apr 2026] Title:Thinking Wrong in Silence: Backdoor Attacks on Continuous Latent Reasoning Authors:Swapnil Parekh View a PDF of the paper titled Thinking Wrong in Silence: Backdoor Attacks on Continuous Latent Reasoning, by Swapnil Parekh View PDF HTML (experimental) Abstract:A new generation of language models reasons entirely in continuous hidden states, producing no tokens and leaving no audit trail. We show that this silence creates a fundamentally new attack surface. ThoughtSteer perturbs a single embedding vector at the input layer; the model's own multi-pass reasoning amplifies this perturbation into a hijacked latent trajectory that reliably produces the attacker's chosen answer, while remaining structurally invisible to every token-level defense. Across two architectures (Coconut and SimCoT), three reasoning benchmarks, and model scales from 124M to 3B parameters, ThoughtSteer achieves >=99% attack success rate with near-baseline clean accuracy, transfers to held-out benchmarks without retraining (94-100%), evades all five evaluated active defenses, and survives 25 epochs of clean fine-tuning. We trace these results to a unifying mechanism: Neural Collapse in the latent space pulls triggered representations onto a tight geometric attractor, explaining both why defenses fail and why any effective backdoor must leave a linearly separable signature (probe AUC>=0.999). Yet a striking paradox eme...

Originally published on April 02, 2026. Curated by AI News.

Llms

Agents that write their own code at runtime and vote on capabilities, no human in the loop

hollowOS just hit v4.4 and I added something that I haven’t seen anyone else do. Previous versions gave you an OS for agents: structured ...

Reddit - Artificial Intelligence · 1 min · 7 minutes ago

Llms

Google Maps can now write captions for your photos using AI | TechCrunch

Gemini can now create captions when users are looking to share a photo or video.

TechCrunch - AI · 4 min · about 1 hour ago

Llms

ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

submitted by /u/PatienceHistorical70 [link] [comments]

Reddit - Machine Learning · 1 min · about 3 hours ago

Llms

Stop Overcomplicating AI Workflows. This Is the Simple Framework

I’ve been working on building an agentic AI workflow system for business use cases and one thing became very clear very quickly. This is ...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

[2604.00770] Thinking Wrong in Silence: Backdoor Attacks on Continuous Latent Reasoning

About this article

Related Articles

Agents that write their own code at runtime and vote on capabilities, no human in the loop

Google Maps can now write captions for your photos using AI | TechCrunch

ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

Stop Overcomplicating AI Workflows. This Is the Simple Framework

No comments

Stay updated with AI News