[2512.22174] BitFlipScope: Scalable Fault Localization and Recovery for Bit-Flip Corruptions in LLMs
About this article
Abstract page for arXiv paper 2512.22174: BitFlipScope: Scalable Fault Localization and Recovery for Bit-Flip Corruptions in LLMs
Computer Science > Distributed, Parallel, and Cluster Computing arXiv:2512.22174 (cs) [Submitted on 18 Dec 2025 (v1), last revised 14 Apr 2026 (this version, v2)] Title:BitFlipScope: Scalable Fault Localization and Recovery for Bit-Flip Corruptions in LLMs Authors:Muhammad Zeeshan Karamat, Sadman Saif, Christiana Chamon Garcia View a PDF of the paper titled BitFlipScope: Scalable Fault Localization and Recovery for Bit-Flip Corruptions in LLMs, by Muhammad Zeeshan Karamat and 2 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) deployed in practical and safety-critical settings are increasingly susceptible to bit-flip faults caused by hardware degradation, cosmic radiation, or deliberate fault-injection attacks such as Rowhammer. These faults silently corrupt internal parameters and can lead to unpredictable or dangerous model behavior. Localizing these corruptions is essential: without identifying the affected region, it is impossible to diagnose the source of degradation, apply targeted corrective measures, or restore model functionality without resorting to costly fine-tuning or full retraining. This work introduces BitFlipScope, a scalable, software-based framework for identifying fault-affected regions within transformer architectures under two deployment scenarios. When a clean reference model is available, BitFlipScope performs differential analysis of outputs, hidden states, and internal activations for detecting anomalous behavior ind...