[2602.19631] Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection

[2602.19631] Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection

arXiv - AI 4 min read Article

Summary

This article discusses a novel approach to concept erasure in text-to-image diffusion models, focusing on High-Level Representation Misdirection (HiRM) to enhance generative capabilities while minimizing quality degradation.

Why It Matters

As text-to-image diffusion models gain traction, concerns about their misuse for harmful content arise. This research proposes a method to effectively erase unwanted concepts while preserving the quality of generated images, addressing both ethical and technical challenges in AI-generated content.

Key Takeaways

  • HiRM allows for precise removal of target concepts in image generation.
  • The method maintains the quality of non-target concepts during generation.
  • Fine-tuning early layers of the text encoder can effectively suppress unwanted visual attributes.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.19631 (cs) [Submitted on 23 Feb 2026] Title:Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection Authors:Uichan Lee, Jeonghyeon Kim, Sangheum Hwang View a PDF of the paper titled Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection, by Uichan Lee and 2 other authors View PDF HTML (experimental) Abstract:Recent advances in text-to-image (T2I) diffusion models have seen rapid and widespread adoption. However, their powerful generative capabilities raise concerns about potential misuse for synthesizing harmful, private, or copyrighted content. To mitigate such risks, concept erasure techniques have emerged as a promising solution. Prior works have primarily focused on fine-tuning the denoising component (e.g., the U-Net backbone). However, recent causal tracing studies suggest that visual attribute information is localized in the early self-attention layers of the text encoder, indicating a potential alternative for concept erasing. Building on this insight, we conduct preliminary experiments and find that directly fine-tuning early layers can suppress target concepts but often degrades the generation quality of non-target concepts. To overcome this limitation, we propose High-Level Representation Misdirection (HiRM), which misdirects high-level semantic representations of target concepts in the text encoder ...

Related Articles

Llms

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Google AI (gai.google) gives Gemini-powered answers for technical queries — think AI-enhanced search with code understanding. I built a C...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

Big increase in the amount of people using AI to write their replies with AI

I find it interesting that we’ve all randomly decided to use the “-“ more often recently on reddit, and everyone’s grammar has drasticall...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] MXFP8 GEMM: Up to 99% of cuBLAS performance using CUDA + PTX

New blog post by Daniel Vega-Myhre (Meta/PyTorch) illustrating GEMM design for FP8, including deep-dives into all the constraints and des...

Reddit - Machine Learning · 1 min ·
IIT Delhi launches 8th batch of Advanced AI, ML, and DL online programme: Check who is eligible, applicat
Machine Learning

IIT Delhi launches 8th batch of Advanced AI, ML, and DL online programme: Check who is eligible, applicat

News News: The Continuing Education Programme (CEP) at IIT Delhi has announced the launch of the 8th batch of its Advanced Certificate Pr...

AI News - General · 9 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime