[2602.18458] The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research
Summary
The article presents a novel evaluation framework for mechanistic interpretability research, utilizing AI agents to enhance research rigor and reproducibility beyond traditional narrative reviews.
Why It Matters
This research addresses the critical issue of reproducibility in scientific studies, particularly in AI, where automated systems can generate vast amounts of outputs. By proposing an execution-grounded evaluation framework, it aims to improve the assessment of research quality, which is vital for advancing scientific integrity and trust in AI technologies.
Key Takeaways
- Introduces an execution-grounded evaluation framework for research.
- Utilizes AI agents to assess research rigor and reproducibility.
- Achieves over 80% agreement with human judges on evaluation outcomes.
- Identifies significant methodological issues often missed by human reviewers.
- Demonstrates the potential for AI to enhance scientific practices.
Computer Science > Computers and Society arXiv:2602.18458 (cs) [Submitted on 5 Feb 2026] Title:The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research Authors:Xiaoyan Bai, Alexander Baumgartner, Haojia Sun, Ari Holtzman, Chenhao Tan View a PDF of the paper titled The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research, by Xiaoyan Bai and 4 other authors View PDF HTML (experimental) Abstract:Reproducibility crises across sciences highlight the limitations of the paper-centric review system in assessing the rigor and reproducibility of research. AI agents that autonomously design and generate large volumes of research outputs exacerbate these challenges. In this work, we address the growing challenges of scalability and rigor by flipping the dynamic and developing AI agents as research evaluators. We propose the first execution-grounded evaluation framework that verifies research beyond narrative review by examining code and data alongside the paper. We use mechanistic interpretability research as a testbed, build standardized research output, and develop MechEvalAgent, an automated evaluation framework that assesses the coherence of the experimental process, the reproducibility of results, and the generalizability of findings. We show that our framework achieves above 80% agreement with human judges, identifies substantial methodological problems, and surfaces 51 additional issues that...