[2602.19631] Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection
Summary
This article discusses a novel approach to concept erasure in text-to-image diffusion models, focusing on High-Level Representation Misdirection (HiRM) to enhance generative capabilities while minimizing quality degradation.
Why It Matters
As text-to-image diffusion models gain traction, concerns about their misuse for harmful content arise. This research proposes a method to effectively erase unwanted concepts while preserving the quality of generated images, addressing both ethical and technical challenges in AI-generated content.
Key Takeaways
- HiRM allows for precise removal of target concepts in image generation.
- The method maintains the quality of non-target concepts during generation.
- Fine-tuning early layers of the text encoder can effectively suppress unwanted visual attributes.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.19631 (cs) [Submitted on 23 Feb 2026] Title:Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection Authors:Uichan Lee, Jeonghyeon Kim, Sangheum Hwang View a PDF of the paper titled Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection, by Uichan Lee and 2 other authors View PDF HTML (experimental) Abstract:Recent advances in text-to-image (T2I) diffusion models have seen rapid and widespread adoption. However, their powerful generative capabilities raise concerns about potential misuse for synthesizing harmful, private, or copyrighted content. To mitigate such risks, concept erasure techniques have emerged as a promising solution. Prior works have primarily focused on fine-tuning the denoising component (e.g., the U-Net backbone). However, recent causal tracing studies suggest that visual attribute information is localized in the early self-attention layers of the text encoder, indicating a potential alternative for concept erasing. Building on this insight, we conduct preliminary experiments and find that directly fine-tuning early layers can suppress target concepts but often degrades the generation quality of non-target concepts. To overcome this limitation, we propose High-Level Representation Misdirection (HiRM), which misdirects high-level semantic representations of target concepts in the text encoder ...