[2604.06285] Harnessing Hyperbolic Geometry for Harmful Prompt

[2604.06285] Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization

arXiv - AI April 09, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.06285: Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization

Computer Science > Cryptography and Security arXiv:2604.06285 (cs) [Submitted on 7 Apr 2026] Title:Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization Authors:Igor Maljkovic, Maria Rosaria Briglia, Iacopo Masi, Antonio Emanuele Cinà, Fabio Roli View a PDF of the paper titled Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization, by Igor Maljkovic and 4 other authors View PDF HTML (experimental) Abstract:Vision-Language Models (VLMs) have become essential for tasks such as image synthesis, captioning, and retrieval by aligning textual and visual information in a shared embedding space. Yet, this flexibility also makes them vulnerable to malicious prompts designed to produce unsafe content, raising critical safety concerns. Existing defenses either rely on blacklist filters, which are easily circumvented, or on heavy classifier-based systems, both of which are costly and fragile under embedding-level attacks. We address these challenges with two complementary components: Hyperbolic Prompt Espial (HyPE) and Hyperbolic Prompt Sanitization (HyPS). HyPE is a lightweight anomaly detector that leverages the structured geometry of hyperbolic space to model benign prompts and detect harmful ones as outliers. HyPS builds on this detection by applying explainable attribution methods to identify and selectively modify harmful words, neutralizing unsafe intent while preserving the original semantics of user prompts. Through extensive exper...

Originally published on April 09, 2026. Curated by AI News.

Llms

We gave 45 psychological questionnaires to 50 LLMs. What we found was not “personality.”

What is the “personality” of an LLM? What actually differentiates models psychometrically? Since LLMs entered public use, researchers hav...

Reddit - Artificial Intelligence · 1 min · 36 minutes ago

Llms

How to Disable Google's Gemini in Chrome | WIRED

Chrome users were caught off guard by a 4-GB Google AI model baked into Chrome, sparking privacy concerns. The good news: You can easily ...

Wired - AI · 6 min · 36 minutes ago

Llms

OpenAI introduces new 'Trusted Contact' safeguard for cases of possible self-harm | TechCrunch

The company is expanding its efforts to protect ChatGPT users in cases where conversations may turn to self-harm.

TechCrunch - AI · 5 min · about 1 hour ago

Llms

Mira Murati’s deposition pulled back the curtain on Sam Altman’s ouster | The Verge

Thanks to Musk v. Altman, the public is getting a concrete look at details of Sam Altman’s ouster from OpenAI, much of it centered on for...

The Verge - AI · 11 min · about 3 hours ago

[2604.06285] Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization

About this article

Related Articles

We gave 45 psychological questionnaires to 50 LLMs. What we found was not “personality.”

How to Disable Google's Gemini in Chrome | WIRED

OpenAI introduces new 'Trusted Contact' safeguard for cases of possible self-harm | TechCrunch

Mira Murati’s deposition pulled back the curtain on Sam Altman’s ouster | The Verge

No comments

Stay updated with AI News