[2504.12007] Diffusion Generative Recommendation with Continuous Tokens
Summary
The paper presents ContRec, a novel framework that integrates continuous tokens into LLM-based recommender systems, enhancing user preference modeling and item retrieval.
Why It Matters
This research addresses limitations in traditional recommender systems by proposing a continuous tokenization approach, which improves gradient propagation and learning efficiency. It highlights the potential of generative AI in advancing recommendation technologies, making it relevant for AI researchers and industry practitioners focused on enhancing user experience through personalized recommendations.
Key Takeaways
- ContRec uses continuous tokens to improve LLM-based recommendation systems.
- The framework includes a sigma-VAE Tokenizer and a Dispersive Diffusion module for better user preference modeling.
- Experiments show ContRec outperforms traditional and state-of-the-art recommender systems.
- The approach addresses issues of lossy tokenization and inaccurate gradient propagation.
- This research opens avenues for future advancements in generative modeling for recommendations.
Computer Science > Information Retrieval arXiv:2504.12007 (cs) [Submitted on 16 Apr 2025 (v1), last revised 24 Feb 2026 (this version, v5)] Title:Diffusion Generative Recommendation with Continuous Tokens Authors:Haohao Qu, Shanru Lin, Yujuan Ding, Yiqi Wang, Wenqi Fan View a PDF of the paper titled Diffusion Generative Recommendation with Continuous Tokens, by Haohao Qu and 4 other authors View PDF HTML (experimental) Abstract:Recent advances in generative artificial intelligence, particularly large language models (LLMs), have opened new opportunities for enhancing recommender systems (RecSys). Most existing LLM-based RecSys approaches operate in a discrete space, using vector-quantized tokenizers to align with the inherent discrete nature of language models. However, these quantization methods often result in lossy tokenization and suboptimal learning, primarily due to inaccurate gradient propagation caused by the non-differentiable argmin operation in standard vector quantization. Inspired by the emerging trend of embracing continuous tokens in language models, we propose ContRec, a novel framework that seamlessly integrates continuous tokens into LLM-based RecSys. Specifically, ContRec consists of two key modules: a sigma-VAE Tokenizer, which encodes users/items with continuous tokens; and a Dispersive Diffusion module, which captures implicit user preference. The tokenizer is trained with a continuous Variational Auto-Encoder (VAE) objective, where three effective te...