[2510.22049] Massive Memorization with Hundreds of Trillions of Parameters for Sequential Transducer Generative Recommenders
Summary
This paper presents VISTA, a novel two-stage modeling framework for generative recommenders that enhances scalability by summarizing user history into a manageable format, allowing for improved performance on large-scale recommendation systems.
Why It Matters
As recommendation systems increasingly rely on vast amounts of user data, the ability to efficiently process and utilize long user histories is crucial. VISTA addresses significant scalability challenges, making it relevant for industries that depend on real-time recommendations for billions of users.
Key Takeaways
- VISTA improves scalability for recommendation systems by summarizing user history into a few hundred tokens.
- The two-stage modeling framework separates user history summarization from candidate item attention, enhancing efficiency.
- This approach allows for handling lifelong user histories of up to one million items without increasing costs.
- VISTA has shown significant improvements in both offline and online metrics.
- The framework has been successfully deployed in an industry-leading recommendation platform.
Computer Science > Information Retrieval arXiv:2510.22049 (cs) [Submitted on 24 Oct 2025 (v1), last revised 25 Feb 2026 (this version, v2)] Title:Massive Memorization with Hundreds of Trillions of Parameters for Sequential Transducer Generative Recommenders Authors:Zhimin Chen, Chenyu Zhao, Ka Chun Mo, Yunjiang Jiang, Jane H. Lee, Khushhall Chandra Mahajan, Ning Jiang, Kai Ren, Jinhui Li, Wen-Yun Yang View a PDF of the paper titled Massive Memorization with Hundreds of Trillions of Parameters for Sequential Transducer Generative Recommenders, by Zhimin Chen and 9 other authors View PDF HTML (experimental) Abstract:Modern large-scale recommendation systems rely heavily on user interaction history sequences to enhance the model performance. The advent of large language models and sequential modeling techniques, particularly transformer-like architectures, has led to significant advancements recently (e.g., HSTU, SIM, and TWIN models). While scaling to ultra-long user histories (10k to 100k items) generally improves model performance, it also creates significant challenges on latency, queries per second (QPS) and GPU cost in industry-scale recommendation systems. Existing models do not adequately address these industrial scalability issues. In this paper, we propose a novel two-stage modeling framework, namely VIrtual Sequential Target Attention (VISTA), which decomposes traditional target attention from a candidate item to user history items into two distinct stages: (1) use...