[2507.12202] Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control
Summary
This article presents a novel approach using Sparse Autoencoders (SAE) for enhancing the interpretability and control of sequential recommendation models, particularly those based on transformer architectures.
Why It Matters
Understanding and controlling recommendation systems is crucial for improving user experience and tailoring suggestions. This research addresses the interpretability of complex models, which can lead to better insights and more effective applications in various industries, enhancing user trust and system performance.
Key Takeaways
- Sparse Autoencoders can enhance the interpretability of transformer-based recommendation systems.
- The proposed framework allows for flexible control over model behavior, adapting recommendations to specific contexts.
- Interpretable features extracted through SAE can lead to more meaningful insights from neural networks.
Computer Science > Information Retrieval arXiv:2507.12202 (cs) [Submitted on 16 Jul 2025 (v1), last revised 17 Feb 2026 (this version, v2)] Title:Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control Authors:Anton Klenitskiy, Konstantin Polev, Daria Denisova, Alexey Vasilev, Dmitry Simakov, Gleb Gusev View a PDF of the paper titled Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control, by Anton Klenitskiy and 5 other authors View PDF HTML (experimental) Abstract:Many current state-of-the-art models for sequential recommendations are based on transformer architectures. Interpretation and explanation of such black box models is an important research question, as a better understanding of their internals can help understand, influence, and control their behavior, which is very important in a variety of real-world applications. Recently, sparse autoencoders (SAE) have been shown to be a promising unsupervised approach to extract interpretable features from neural networks. In this work, we extend SAE to sequential recommender systems and propose a framework for interpreting and controlling model representations. We show that this approach can be successfully applied to the transformer trained on a sequential recommendation task: directions learned in such an unsupervised regime turn out to be more interpretable and monosemantic than the original hidden state dimensions. Further, we demonstrate a st...