[2602.13483] Finding Highly Interpretable Prompt-Specific Circuits in Language Models

[2602.13483] Finding Highly Interpretable Prompt-Specific Circuits in Language Models

arXiv - AI 4 min read Article

Summary

This article presents a novel approach to understanding prompt-specific circuits in language models, demonstrating that circuits vary by prompt even within fixed tasks, enhancing mechanistic interpretability.

Why It Matters

Understanding how language models operate at a granular level is crucial for improving their interpretability and reliability. This research shifts focus from task-level analysis to prompt-specific mechanisms, which can lead to better model design and application in real-world scenarios.

Key Takeaways

  • Circuits in language models are prompt-specific, not task-specific.
  • The ACC++ method improves the precision of causal signals in attention heads.
  • Different prompts can induce systematically different mechanisms in models.
  • Prompts can be grouped into families with similar circuits for analysis.
  • An automated pipeline for interpretability can enhance understanding of model behavior.

Computer Science > Machine Learning arXiv:2602.13483 (cs) [Submitted on 13 Feb 2026] Title:Finding Highly Interpretable Prompt-Specific Circuits in Language Models Authors:Gabriel Franco, Lucas M. Tassis, Azalea Rohr, Mark Crovella View a PDF of the paper titled Finding Highly Interpretable Prompt-Specific Circuits in Language Models, by Gabriel Franco and 3 other authors View PDF Abstract:Understanding the internal circuits that language models use to solve tasks remains a central challenge in mechanistic interpretability. Most prior work identifies circuits at the task level by averaging across many prompts, implicitly assuming a single stable mechanism per task. We show that this assumption can obscure a crucial source of structure: circuits are prompt-specific, even within a fixed task. Building on attention causal communication (ACC) (Franco & Crovella, 2025), we introduce ACC++, refinements that extract cleaner, lower-dimensional causal signals inside attention heads from a single forward pass. Like ACC, our approach does not require replacement models (e.g., SAEs) or activation patching; ACC++ further improves circuit precision by reducing attribution noise. Applying ACC++ to indirect object identification (IOI) in GPT-2, Pythia, and Gemma 2, we find there is no single circuit for IOI in any model: different prompt templates induce systematically different mechanisms. Despite this variation, prompts cluster into prompt families with similar circuits, and we propose ...

Related Articles

Llms

[D] How's MLX and jax/ pytorch on MacBooks these days?

​ So I'm looking at buying a new 14 inch MacBook pro with m5 pro and 64 gb of memory vs m4 max with same specs. My priorities are pro sof...

Reddit - Machine Learning · 1 min ·
Llms

[R] 94.42% on BANKING77 Official Test Split with Lightweight Embedding + Example Reranking (strict full-train protocol)

BANKING77 (77 fine-grained banking intents) is a well-established but increasingly saturated intent classification benchmark. did this wh...

Reddit - Machine Learning · 1 min ·
The “Agony” or ChatGPT: Would You Let AI Write Your Wedding Speech?
Llms

The “Agony” or ChatGPT: Would You Let AI Write Your Wedding Speech?

As more Americans use AI chatbots like ChatGPT to compose their wedding vows, one expert asks: “Is the speech sacred to you?”

AI Tools & Products · 12 min ·
I tested Gemini on Android Auto and now I can't stop talking to it: 5 tasks it nails
Llms

I tested Gemini on Android Auto and now I can't stop talking to it: 5 tasks it nails

I didn't see much benefit for Google's AI - until now. Here are my favorite ways to use the new Gemini integration in my car.

AI Tools & Products · 7 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime