[2604.04385] How Alignment Routes: Localizing, Scaling, and

[2604.04385] How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models

arXiv - AI April 07, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.04385: How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models

Computer Science > Computation and Language arXiv:2604.04385 (cs) [Submitted on 6 Apr 2026] Title:How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models Authors:Gregory N. Frank View a PDF of the paper titled How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models, by Gregory N. Frank View PDF HTML (experimental) Abstract:We identify a recurring sparse routing mechanism in alignment-trained language models: a gate attention head reads detected content and triggers downstream amplifier heads that boost the signal toward refusal. Using political censorship and safety refusal as natural experiments, we trace this mechanism across 9 models from 6 labs, all validated on corpora of 120 prompt pairs. The gate head passes necessity and sufficiency interchange tests (p < 0.001, permutation null), and core amplifier heads are stable under bootstrap resampling (Jaccard 0.92-1.0). Three same-generation scaling pairs show that routing distributes at scale (ablation up to 17x weaker) while remaining detectable by interchange. By modulating the detection-layer signal, we continuously control policy strength from hard refusal through steering to factual compliance, with routing thresholds that vary by topic. The circuit also reveals a structural separation between intent recognition and policy routing: under cipher encoding, the gate head's routing contribution collapses (78% in Phi-4 at n=120) while the model re...

Originally published on April 07, 2026. Curated by AI News.

Llms

Founding Engineer (Full-Stack / AI) – Build the Future of Personalized Healthcare (San Francisco, In-Person)

Hi everyone Galen AI is an early-stage, YC-backed healthtech startup building a personal AI doctor by combining clinical data, wearable d...

Reddit - ML Jobs · 1 min · 42 minutes ago

Llms

Looking for Job Opportunities — Senior MLOps / LLMOps Engineer (Remote / Visa Sponsorship)

Hi Everyone 👋 I’m a Senior MLOps / LLMOps Engineer with ~5 years of experience building and operating production-scale ML & LLM platf...

Reddit - ML Jobs · 1 min · 42 minutes ago

Llms

Early career / PhD (USA only) - $80-120/hr

Mercor is hiring Machine Learning Engineers to: Draft detailed natural-language plans and code implementations for machine learning tasks...

Reddit - ML Jobs · 1 min · 42 minutes ago

Llms

Hello MLjobs, I'm looking for research internships.

About me: I'm into Deep Learning Research particularly in multimodal AI/LLMs based in Mumbai, India. I have read papers and I re-implemen...

Reddit - ML Jobs · 1 min · 43 minutes ago

[2604.04385] How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models

About this article

Related Articles

Founding Engineer (Full-Stack / AI) – Build the Future of Personalized Healthcare (San Francisco, In-Person)

Looking for Job Opportunities — Senior MLOps / LLMOps Engineer (Remote / Visa Sponsorship)

Early career / PhD (USA only) - $80-120/hr

Hello MLjobs, I'm looking for research internships.

No comments

Stay updated with AI News