[2602.22581] IBCircuit: Towards Holistic Circuit Discovery with Information Bottleneck
Summary
The paper presents IBCircuit, a novel framework for holistic circuit discovery in machine learning models using the Information Bottleneck principle, improving task-specific circuit identification.
Why It Matters
Understanding the internal workings of language models is crucial for enhancing their interpretability and efficiency. IBCircuit addresses the limitations of existing methods by offering a more accurate and comprehensive approach to discovering the circuits responsible for specific tasks, which can lead to better model optimization and application.
Key Takeaways
- IBCircuit utilizes the Information Bottleneck principle for circuit discovery.
- The framework identifies circuits holistically, improving accuracy over previous methods.
- It eliminates the need for task-specific corrupted activations.
- IBCircuit demonstrates superior performance in identifying critical components in tasks.
- This approach can be applied to various tasks without extensive redesign.
Computer Science > Machine Learning arXiv:2602.22581 (cs) [Submitted on 26 Feb 2026] Title:IBCircuit: Towards Holistic Circuit Discovery with Information Bottleneck Authors:Tian Bian, Yifan Niu, Chaohao Yuan, Chengzhi Piao, Bingzhe Wu, Long-Kai Huang, Yu Rong, Tingyang Xu, Hong Cheng, Jia Li View a PDF of the paper titled IBCircuit: Towards Holistic Circuit Discovery with Information Bottleneck, by Tian Bian and 9 other authors View PDF HTML (experimental) Abstract:Circuit discovery has recently attracted attention as a potential research direction to explain the non-trivial behaviors of language models. It aims to find the computational subgraphs, also known as circuits, within the model that are responsible for solving specific tasks. However, most existing studies overlook the holistic nature of these circuits and require designing specific corrupted activations for different tasks, which is inaccurate and inefficient. In this work, we propose an end-to-end approach based on the principle of Information Bottleneck, called IBCircuit, to identify informative circuits holistically. IBCircuit is an optimization framework for holistic circuit discovery and can be applied to any given task without tediously corrupted activation design. In both the Indirect Object Identification (IOI) and Greater-Than tasks, IBCircuit identifies more faithful and minimal circuits in terms of critical node components and edge components compared to recent related work. Subjects: Machine Learnin...