[2506.22685] Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment
Summary
This paper addresses the issue of semantic collapse in generative personalization, proposing a method to adjust embeddings at inference time to enhance text-image alignment.
Why It Matters
As generative AI becomes increasingly prevalent, ensuring that models maintain semantic richness in outputs is crucial for applications in various fields, including art and design. This research provides a solution to a significant problem that can improve the performance of generative models, making it relevant for developers and researchers in machine learning and AI.
Key Takeaways
- Semantic collapse reduces the richness of generated content.
- The proposed method adjusts embeddings at inference time without retraining.
- Improvements in text-image alignment can enhance user experience in generative applications.
- The approach is broadly applicable across different personalization methods.
- Understanding and mitigating semantic collapse is vital for advancing generative AI.
Computer Science > Machine Learning arXiv:2506.22685 (cs) [Submitted on 27 Jun 2025 (v1), last revised 25 Feb 2026 (this version, v3)] Title:Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment Authors:Anh Bui, Trang Vu, Trung Le, Junae Kim, Tamas Abraham, Rollin Omari, Amar Kaur, Dinh Phung View a PDF of the paper titled Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment, by Anh Bui and 7 other authors View PDF HTML (experimental) Abstract:In this paper, we investigate the semantic collapsing problem in generative personalization, an under-explored topic where the learned visual concept ($V$) gradually shifts from its original textual meaning and comes to dominate other concepts in multi-concept input prompts. This issue not only reduces the semantic richness of complex input prompts like "a photo of $V$ wearing glasses and playing guitar" into simpler, less contextually rich forms such as "a photo of $V$" but also leads to simplified output images that fail to capture the intended concept. We identify the root cause as unconstrained optimisation, which allows the learned embedding $V$ to drift arbitrarily in the embedding space, both in direction and magnitude. To address this, we propose a simple yet effective training-free method that adjusts the magnitude and direction of pre-trained embedding at inference time, effectively mitigating the semantic collapsing problem. Our method is br...