[2602.12593] RQ-GMM: Residual Quantized Gaussian Mixture Model for Multimodal Semantic Discretization in CTR Prediction
Summary
The paper introduces RQ-GMM, a novel model for improving click-through rate (CTR) prediction by effectively discretizing multimodal embeddings, enhancing performance through better codebook utilization and reconstruction accuracy.
Why It Matters
This research addresses the challenges in CTR prediction by optimizing the integration of multimodal content, which is vital for effective online recommendations. The proposed RQ-GMM model demonstrates significant improvements in performance metrics, making it relevant for industries reliant on accurate CTR predictions, such as advertising and content platforms.
Key Takeaways
- RQ-GMM enhances CTR prediction by discretizing multimodal embeddings.
- The model improves codebook utilization and reconstruction accuracy.
- Experiments show a 1.502% gain in Advertiser Value over existing methods.
- The approach has been successfully deployed in real-world applications.
- Probabilistic modeling is key to capturing the statistical structure of embeddings.
Computer Science > Information Retrieval arXiv:2602.12593 (cs) [Submitted on 13 Feb 2026] Title:RQ-GMM: Residual Quantized Gaussian Mixture Model for Multimodal Semantic Discretization in CTR Prediction Authors:Ziye Tong, Jiahao Liu, Weimin Zhang, Hongji Ruan, Derick Tang, Zhanpeng Zeng, Qinsong Zeng, Peng Zhang, Tun Lu, Ning Gu View a PDF of the paper titled RQ-GMM: Residual Quantized Gaussian Mixture Model for Multimodal Semantic Discretization in CTR Prediction, by Ziye Tong and 9 other authors View PDF HTML (experimental) Abstract:Multimodal content is crucial for click-through rate (CTR) prediction. However, directly incorporating continuous embeddings from pre-trained models into CTR models yields suboptimal results due to misaligned optimization objectives and convergence speed inconsistency during joint training. Discretizing embeddings into semantic IDs before feeding them into CTR models offers a more effective solution, yet existing methods suffer from limited codebook utilization, reconstruction accuracy, and semantic discriminability. We propose RQ-GMM (Residual Quantized Gaussian Mixture Model), which introduces probabilistic modeling to better capture the statistical structure of multimodal embedding spaces. Through Gaussian Mixture Models combined with residual quantization, RQ-GMM achieves superior codebook utilization and reconstruction accuracy. Experiments on public datasets and online A/B tests on a large-scale short-video platform serving hundreds of ...