[2512.07805] Group Representational Position Encoding
Summary
The paper introduces GRAPE (Group Representational Position Encoding), a framework for positional encoding that integrates multiplicative rotations and additive logit biases, enhancing long-context models in machine learning.
Why It Matters
GRAPE offers a unified approach to positional encoding, which is crucial for improving the performance of models dealing with long sequences. By integrating existing methods like RoPE and ALiBi, it provides a more comprehensive framework that can enhance various applications in natural language processing and beyond.
Key Takeaways
- GRAPE combines multiplicative and additive positional encoding methods.
- It extends the capabilities of existing models like RoPE and ALiBi.
- The framework supports efficient processing of long-context data.
- It introduces a principled design space for positional geometry.
- The approach can enhance feature coupling in complex models.
Computer Science > Machine Learning arXiv:2512.07805 (cs) [Submitted on 8 Dec 2025 (v1), last revised 19 Feb 2026 (this version, v3)] Title:Group Representational Position Encoding Authors:Yifan Zhang, Zixiang Chen, Yifeng Liu, Zhen Qin, Huizhuo Yuan, Kangping Xu, Yang Yuan, Quanquan Gu, Andrew Chi-Chih Yao View a PDF of the paper titled Group Representational Position Encoding, by Yifan Zhang and 8 other authors View PDF HTML (experimental) Abstract:We present GRAPE (Group Representational Position Encoding), a unified framework for positional encoding based on group actions. GRAPE unifies two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in $\operatorname{SO}(d)$ and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group $\mathrm{GL}$. In Multiplicative GRAPE, a position $n \in \mathbb{Z}$ (or $t \in \mathbb{R}$) acts as $\mathbf{G}(n) = \exp(n \, \omega \, \mathbf{L})$ with a rank-2 skew-symmetric generator $\mathbf{L} \in \mathbb{R}^{d \times d}$, yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the $d/2$ planes correspond to canonical coordinate pairs with a log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at $O(d)$ and $O(r d)$ cost per head, respectively. In Additive GRAPE, additive logits arise from rank-1...