Machine Learning Generative Ai Ai Safety

[2507.17937] Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation

arXiv - AI February 27, 2026 4 min read Article

Summary

The paper presents a novel attack method, Adversarial PhoneTic Prompting (APT), that exploits phonetic memorization in generative AI systems for music and video, demonstrating vulnerabilities in current copyright safeguards.

Why It Matters

This research highlights significant weaknesses in copyright protection mechanisms used by generative AI models. By revealing how phonetic structures can bypass lexical filters, it raises concerns about the integrity of AI-generated content and the potential for copyright infringement, impacting creators and the industry.

Key Takeaways

APT can effectively bypass copyright filters by exploiting phonetic memorization.
The method achieves high similarity to original lyrics, indicating a vulnerability in generative models.
Phonetic structure is prioritized over semantic meaning in AI encoding, leading to potential copyright violations.
The vulnerability extends beyond music to visual content generation, showcasing cross-modal risks.
Current defenses against phonetic-semantic attacks are inadequate, necessitating improved safeguards.

Computer Science > Sound arXiv:2507.17937 (cs) [Submitted on 23 Jul 2025 (v1), last revised 25 Feb 2026 (this version, v4)] Title:Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation Authors:Jaechul Roh, Zachary Novack, Yuefeng Peng, Niloofar Mireshghallah, Taylor Berg-Kirkpatrick, Amir Houmansadr View a PDF of the paper titled Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation, by Jaechul Roh and 5 other authors View PDF HTML (experimental) Abstract:Generative AI systems for music and video commonly use text-based filters to prevent regurgitation of copyrighted material. We expose a significant vulnerability in this approach by introducing Adversarial PhoneTic Prompting (APT), a novel attack that bypasses these safeguards by exploiting phonetic memorization--the tendency of models to bind sub-lexical acoustic patterns (phonemes, rhyme, stress, cadence) to memorized copyrighted content. APT replaces iconic lyrics with homophonic but semantically unrelated alternatives (e.g., "mom's spaghetti" becomes "Bob's confetti"), preserving phonetic structure while evading lexical filters. We evaluate APT on leading lyrics-to-song models (Suno, YuE) across English and Korean songs spanning rap, pop, and K-pop. APT achieves 91% average similarity to copyrighted originals, versus 13.7% for random lyrics and 42.2% for semantic paraphrases. Embedding analysis confirms the mechanism: YuE's text encoder treats APT-modified lyrics as near-iden...

Read Original Article

[2507.17937] Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation

Summary

Why It Matters

Key Takeaways

Related Articles

[R] Are there ML approaches for prioritizing and routing “important” signals across complex systems?

[P] I trained a language model from scratch for a low resource language and got it running fully on-device on Android (no GPU, demo)

[R] Structure Over Scale: Memory-First Reasoning and Depth-Pruned Efficiency in Magnus and Seed Architecture Auto-Discovery

UM Computer Scientists Land Grant to Improve Models of Melting Greenland Glaciers

No comments

Stay updated with AI News