[2602.22968] Certified Circuits: Stability Guarantees for Mechanistic Circuits

[2602.22968] Certified Circuits: Stability Guarantees for Mechanistic Circuits

arXiv - AI 3 min read Article

Summary

The paper introduces Certified Circuits, a framework that enhances the stability and accuracy of circuit discovery in neural networks, addressing the limitations of existing methods.

Why It Matters

Understanding neural networks' decision-making processes is crucial for their deployment and reliability. Certified Circuits provide a formal approach to ensure that discovered circuits are stable and accurate, which is vital for advancing mechanistic interpretability in AI.

Key Takeaways

  • Certified Circuits offer provable stability guarantees for circuit discovery.
  • The framework improves accuracy by up to 91% while reducing neuron usage by 45%.
  • It addresses the brittleness of existing circuit discovery methods.
  • Stable circuits yield more reliable mechanistic explanations.
  • Code for the framework will be released soon, promoting further research.

Computer Science > Artificial Intelligence arXiv:2602.22968 (cs) [Submitted on 26 Feb 2026] Title:Certified Circuits: Stability Guarantees for Mechanistic Circuits Authors:Alaa Anani, Tobias Lorenz, Bernt Schiele, Mario Fritz, Jonas Fischer View a PDF of the paper titled Certified Circuits: Stability Guarantees for Mechanistic Circuits, by Alaa Anani and 4 other authors View PDF HTML (experimental) Abstract:Understanding how neural networks arrive at their predictions is essential for debugging, auditing, and deployment. Mechanistic interpretability pursues this goal by identifying circuits - minimal subnetworks responsible for specific behaviors. However, existing circuit discovery methods are brittle: circuits depend strongly on the chosen concept dataset and often fail to transfer out-of-distribution, raising doubts whether they capture concept or dataset-specific artifacts. We introduce Certified Circuits, which provide provable stability guarantees for circuit discovery. Our framework wraps any black-box discovery algorithm with randomized data subsampling to certify that circuit component inclusion decisions are invariant to bounded edit-distance perturbations of the concept dataset. Unstable neurons are abstained from, yielding circuits that are more compact and more accurate. On ImageNet and OOD datasets, certified circuits achieve up to 91% higher accuracy while using 45% fewer neurons, and remain reliable where baselines degrade. Certified Circuits puts circuit d...

Related Articles

Llms

[R] BraiNN: An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning

BraiNN An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning BraiNN is a compact research‑...

Reddit - Machine Learning · 1 min ·
Machine Learning

[HIRING]Remote AI Training Jobs -Up to $1K/Week| Collaborators Wanted.USA

submitted by /u/nortonakenga [link] [comments]

Reddit - ML Jobs · 1 min ·
Machine Learning

VulcanAMI Might Help

I open-sourced a large AI platform I built solo, working 16 hours a day, at my kitchen table, fueled by an inordinate degree of compulsio...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[P] I tested Meta’s brain-response model on posts. It predicted the Elon one almost perfectly.

I built an experimental UI and visualization layer around Meta’s open brain-response model just to see whether this stuff actually works ...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime