[2601.16905] GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints
Summary
The paper presents GRIP, a novel algorithm-agnostic framework for machine unlearning in Mixture-of-Experts architectures, addressing the limitations of existing methods.
Why It Matters
As AI systems increasingly integrate machine learning models, ensuring the ability to unlearn information is crucial for data privacy and compliance. GRIP offers a robust solution that enhances the safety and utility of Mixture-of-Experts models, which are prevalent in large-scale AI applications.
Key Takeaways
- GRIP addresses the limitations of traditional unlearning methods in Mixture-of-Experts architectures.
- The framework utilizes geometric constraints to ensure effective knowledge erasure from expert parameters.
- GRIP maintains routing stability while allowing necessary internal model adjustments.
- Extensive experiments demonstrate GRIP's effectiveness in preserving model utility during unlearning.
- The approach adapts existing unlearning techniques for more complex AI architectures.
Computer Science > Machine Learning arXiv:2601.16905 (cs) [Submitted on 23 Jan 2026 (v1), last revised 15 Feb 2026 (this version, v2)] Title:GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints Authors:Andy Zhu, Rongzhe Wei, Yupu Gu, Pan Li View a PDF of the paper titled GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints, by Andy Zhu and 3 other authors View PDF HTML (experimental) Abstract:Machine unlearning (MU) for large language models has become critical for AI safety, yet existing methods fail to generalize to Mixture-of-Experts (MoE) architectures. We identify that traditional unlearning methods exploit MoE's architectural vulnerability: they manipulate routers to redirect queries away from knowledgeable experts rather than erasing knowledge, causing a loss of model utility and superficial forgetting. We propose Geometric Routing Invariance Preservation (GRIP), an algorithm-agnostic framework for unlearning for MoE. Our core contribution is a geometric constraint, implemented by projecting router gradient updates into an expert-specific null-space. Crucially, this decouples routing stability from parameter rigidity: while discrete expert selections remain stable for retained knowledge, the continuous router parameters remain plastic within the null space, allowing the model to undergo necessary internal reconfiguration to satisfy unlearning objectives. This forces the unl...