[2508.20570] Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP

[2508.20570] Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP

arXiv - AI 4 min read Article

Summary

The paper presents Dyslexify, a novel defense mechanism against typographic attacks in CLIP models, enhancing robustness without finetuning while maintaining performance.

Why It Matters

As multi-modal systems become increasingly prevalent, understanding and mitigating vulnerabilities like typographic attacks is crucial for ensuring the reliability and safety of AI applications. Dyslexify offers a promising solution that can be readily implemented in safety-critical environments.

Key Takeaways

  • Dyslexify effectively defends CLIP models against typographic attacks by targeting specific attention heads.
  • The method improves performance on typographic variants of datasets by up to 22.06% without requiring model finetuning.
  • Dyslexify maintains nearly the same accuracy on standard datasets while enhancing robustness against text manipulation.
  • The approach is competitive with existing state-of-the-art defenses, making it a viable option for various applications.
  • The release of dyslexic CLIP models provides practical tools for developers working in safety-critical AI domains.

Computer Science > Computer Vision and Pattern Recognition arXiv:2508.20570 (cs) [Submitted on 28 Aug 2025 (v1), last revised 26 Feb 2026 (this version, v2)] Title:Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP Authors:Lorenz Hufe, Constantin Venhoff, Erblina Purelku, Maximilian Dreyer, Sebastian Lapuschkin, Wojciech Samek View a PDF of the paper titled Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP, by Lorenz Hufe and 5 other authors View PDF HTML (experimental) Abstract:Typographic attacks exploit multi-modal systems by injecting text into images, leading to targeted misclassifications, malicious content generation and even Vision-Language Model jailbreaks. In this work, we analyze how CLIP vision encoders behave under typographic attacks, locating specialized attention heads in the latter half of the model's layers that causally extract and transmit typographic information to the cls token. Building on these insights, we introduce Dyslexify - a method to defend CLIP models against typographic attacks by selectively ablating a typographic circuit, consisting of attention heads. Without requiring finetuning, dyslexify improves performance by up to 22.06% on a typographic variant of ImageNet-100, while reducing standard ImageNet-100 accuracy by less than 1%, and demonstrate its utility in a medical foundation model for skin lesion diagnosis. Notably, our training-free approach remains competitive with current state-of-the-art typ...

Related Articles

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED
Llms

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED

Ads are rolling out across the US on ChatGPT’s free tier. I asked OpenAI's bot 500 questions to see what these ads were like and how they...

Wired - AI · 9 min ·
Llms

Abacus.Ai Claw LLM consumes an incredible amount of credit without any usage :(

Three days ago, I clicked the "Deploy OpenClaw In Seconds" button to get an overview of the new service, but I didn't build any automatio...

Reddit - Artificial Intelligence · 1 min ·
Google’s Gemini AI app debuts in Hong Kong
Llms

Google’s Gemini AI app debuts in Hong Kong

Tech giant’s chatbot service tops Apple’s app store chart in the city.

AI Tools & Products · 2 min ·
Google Launches Gemini Import Tools to Poach Users From Rival AI Apps
Llms

Google Launches Gemini Import Tools to Poach Users From Rival AI Apps

Anyone looking to switch their AI assistant will find it surprisingly easy, as it only takes a few steps to move from A to B. This is not...

AI Tools & Products · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime