[2602.21052] Position-Aware Sequential Attention for Accurate Next Item Recommendations
Summary
The paper presents a novel kernelized self-attention mechanism designed to enhance next-item recommendations by improving the representation of positional information in sequential data.
Why It Matters
This research addresses the limitations of traditional additive positional embeddings in self-attention models, which can obscure the temporal order of sequences. By introducing a position-aware mechanism, the authors enhance the model's ability to capture complex sequential patterns, which is crucial for applications in recommendation systems and information retrieval.
Key Takeaways
- Traditional positional embeddings can weaken the model's sensitivity to sequence order.
- The proposed kernelized self-attention mechanism disentangles positional information from item semantics.
- Experiments show significant improvements in next-item prediction benchmarks compared to existing methods.
Computer Science > Information Retrieval arXiv:2602.21052 (cs) [Submitted on 24 Feb 2026] Title:Position-Aware Sequential Attention for Accurate Next Item Recommendations Authors:Timur Nabiev, Evgeny Frolov View a PDF of the paper titled Position-Aware Sequential Attention for Accurate Next Item Recommendations, by Timur Nabiev and 1 other authors View PDF HTML (experimental) Abstract:Sequential self-attention models usually rely on additive positional embeddings, which inject positional information into item representations at the input. In the absence of positional signals, the attention block is permutation-equivariant over sequence positions and thus has no intrinsic notion of temporal order beyond causal masking. We argue that additive positional embeddings make the attention mechanism only superficially sensitive to sequence order: positional information is entangled with item embedding semantics, propagates weakly in deep architectures, and limits the ability to capture rich sequential patterns. To address these limitations, we introduce a kernelized self-attention mechanism, where a learnable positional kernel operates purely in the position space, disentangled from semantic similarity, and directly modulates attention weights. When applied per attention block, this kernel enables adaptive multi-scale sequential modeling. Experiments on standard next-item prediction benchmarks show that our positional kernel attention consistently improves over strong competing bas...