[2602.14050] Position Encoding with Random Float Sampling Enhances Length Generalization of Transformers
Summary
This paper introduces a novel position encoding strategy, Random Float Sampling (RFS), which enhances the length generalization capabilities of Transformers, showing improved performance on unseen input lengths.
Why It Matters
Length generalization is crucial for the effectiveness of language models, especially as they encounter longer inputs than those seen during training. The proposed RFS method addresses out-of-distribution issues, making it a significant advancement for applications in natural language processing and machine learning.
Key Takeaways
- Random Float Sampling (RFS) improves length generalization in Transformers.
- RFS avoids out-of-distribution issues by using continuous values for position indices.
- The method can be integrated with existing position encodings like sinusoidal and ALiBi.
- Experiments demonstrate RFS's effectiveness in length generalization tasks.
- RFS also enhances performance in zero-shot commonsense reasoning benchmarks.
Computer Science > Machine Learning arXiv:2602.14050 (cs) [Submitted on 15 Feb 2026] Title:Position Encoding with Random Float Sampling Enhances Length Generalization of Transformers Authors:Atsushi Shimizu, Shohei Taniguchi, Yutaka Matsuo View a PDF of the paper titled Position Encoding with Random Float Sampling Enhances Length Generalization of Transformers, by Atsushi Shimizu and 2 other authors View PDF HTML (experimental) Abstract:Length generalization is the ability of language models to maintain performance on inputs longer than those seen during pretraining. In this work, we introduce a simple yet powerful position encoding (PE) strategy, Random Float Sampling (RFS), that generalizes well to lengths unseen during pretraining or fine-tuning. In particular, instead of selecting position indices from a predefined discrete set, RFS uses randomly sampled continuous values, thereby avoiding out-of-distribution (OOD) issues on unseen lengths by exposing the model to diverse indices during training. Since assigning indices to tokens is a common and fundamental procedure in widely used PEs, the advantage of RFS can easily be incorporated into, for instance, the absolute sinusoidal encoding, RoPE, and ALiBi. Experiments corroborate its effectiveness by showing that RFS results in superior performance in length generalization tasks as well as zero-shot commonsense reasoning benchmarks. Comments: Subjects: Machine Learning (cs.LG) Cite as: arXiv:2602.14050 [cs.LG] (or arXiv...