Llms Machine Learning Nlp Ai Agents

[2602.16375] Variable-Length Semantic IDs for Recommender Systems

arXiv - Machine Learning February 19, 2026 4 min read Article

Summary

This paper introduces variable-length semantic identifiers for recommender systems, addressing challenges in item representation and improving efficiency in modeling user behavior.

Why It Matters

As recommender systems increasingly utilize generative models, the need for efficient item representation becomes critical. This research offers a novel approach to optimize semantic identifiers, potentially enhancing recommendation accuracy and user experience in various applications.

Key Takeaways

Variable-length semantic IDs improve item representation in recommender systems.
The proposed method addresses the inefficiencies of fixed-length identifiers.
Integrates concepts from emergent communication to enhance model training.
Utilizes a discrete variational autoencoder for adaptive length representations.
Aims to bridge the gap between user behavior modeling and item identification.

Computer Science > Information Retrieval arXiv:2602.16375 (cs) [Submitted on 18 Feb 2026] Title:Variable-Length Semantic IDs for Recommender Systems Authors:Kirill Khrylchenko View a PDF of the paper titled Variable-Length Semantic IDs for Recommender Systems, by Kirill Khrylchenko View PDF HTML (experimental) Abstract:Generative models are increasingly used in recommender systems, both for modeling user behavior as event sequences and for integrating large language models into recommendation pipelines. A key challenge in this setting is the extremely large cardinality of item spaces, which makes training generative models difficult and introduces a vocabulary gap between natural language and item identifiers. Semantic identifiers (semantic IDs), which represent items as sequences of low-cardinality tokens, have recently emerged as an effective solution to this problem. However, existing approaches generate semantic identifiers of fixed length, assigning the same description length to all items. This is inefficient, misaligned with natural language, and ignores the highly skewed frequency structure of real-world catalogs, where popular items and rare long-tail items exhibit fundamentally different information requirements. In parallel, the emergent communication literature studies how agents develop discrete communication protocols, often producing variable-length messages in which frequent concepts receive shorter descriptions. Despite the conceptual similarity, these ide...

Read Original Article