[2505.14226] Phonetic Perturbations Reveal Tokenizer-Rooted Safety

[2505.14226] Phonetic Perturbations Reveal Tokenizer-Rooted Safety Gaps in LLMs

arXiv - AI April 08, 2026 3 min read

About this article

Abstract page for arXiv paper 2505.14226: Phonetic Perturbations Reveal Tokenizer-Rooted Safety Gaps in LLMs

Computer Science > Computation and Language arXiv:2505.14226 (cs) [Submitted on 20 May 2025 (v1), last revised 7 Apr 2026 (this version, v5)] Title:Phonetic Perturbations Reveal Tokenizer-Rooted Safety Gaps in LLMs Authors:Darpan Aswal, Siddharth D Jaiswal View a PDF of the paper titled Phonetic Perturbations Reveal Tokenizer-Rooted Safety Gaps in LLMs, by Darpan Aswal and Siddharth D Jaiswal View PDF HTML (experimental) Abstract:Safety-aligned LLMs remain vulnerable to digital phenomena like textese that introduce non-canonical perturbations to words but preserve the phonetics. We introduce CMP-RT (code-mixed phonetic perturbations for red-teaming), a novel diagnostic probe that pinpoints tokenization as the root cause of this vulnerability. A mechanistic analysis reveals that phonetic perturbations fragment safety-critical tokens into benign sub-words, suppressing their attribution scores while preserving prompt interpretability -- causing safety mechanisms to fail despite excellent input understanding. We demonstrate that this vulnerability evades standard defenses, persists across modalities and state-of-the-art (SOTA) models including Gemini-3-Pro, and scales through simple supervised fine-tuning (SFT). Furthermore, layer-wise probing shows perturbed and canonical input representations align up to a critical layer depth; enforcing output equivalence robustly recovers the lost representations, providing causal evidence for a structural gap between pre-training and alig...

Originally published on April 08, 2026. Curated by AI News.

Llms

[2603.16105] Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

Abstract page for arXiv paper 2603.16105: Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

arXiv - AI · 4 min · about 4 hours ago

Llms

[2603.09643] MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

Abstract page for arXiv paper 2603.09643: MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Contro...

arXiv - AI · 4 min · about 4 hours ago

Llms

[2603.07339] Agora: Teaching the Skill of Consensus-Finding with AI Personas Grounded in Human Voice

Abstract page for arXiv paper 2603.07339: Agora: Teaching the Skill of Consensus-Finding with AI Personas Grounded in Human Voice

arXiv - AI · 4 min · about 4 hours ago

Llms

[2602.00185] QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

Abstract page for arXiv paper 2602.00185: QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

arXiv - AI · 4 min · about 4 hours ago

[2505.14226] Phonetic Perturbations Reveal Tokenizer-Rooted Safety Gaps in LLMs

About this article

Related Articles

[2603.16105] Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

[2603.09643] MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

[2603.07339] Agora: Teaching the Skill of Consensus-Finding with AI Personas Grounded in Human Voice

[2602.00185] QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

No comments

Stay updated with AI News