[2602.14050] Position Encoding with Random Float Sampling Enhances Length Generalization of Transformers

[2602.14050] Position Encoding with Random Float Sampling Enhances Length Generalization of Transformers

arXiv - Machine Learning 3 min read Article

Summary

This paper introduces a novel position encoding strategy, Random Float Sampling (RFS), which enhances the length generalization capabilities of Transformers, showing improved performance on unseen input lengths.

Why It Matters

Length generalization is crucial for the effectiveness of language models, especially as they encounter longer inputs than those seen during training. The proposed RFS method addresses out-of-distribution issues, making it a significant advancement for applications in natural language processing and machine learning.

Key Takeaways

  • Random Float Sampling (RFS) improves length generalization in Transformers.
  • RFS avoids out-of-distribution issues by using continuous values for position indices.
  • The method can be integrated with existing position encodings like sinusoidal and ALiBi.
  • Experiments demonstrate RFS's effectiveness in length generalization tasks.
  • RFS also enhances performance in zero-shot commonsense reasoning benchmarks.

Computer Science > Machine Learning arXiv:2602.14050 (cs) [Submitted on 15 Feb 2026] Title:Position Encoding with Random Float Sampling Enhances Length Generalization of Transformers Authors:Atsushi Shimizu, Shohei Taniguchi, Yutaka Matsuo View a PDF of the paper titled Position Encoding with Random Float Sampling Enhances Length Generalization of Transformers, by Atsushi Shimizu and 2 other authors View PDF HTML (experimental) Abstract:Length generalization is the ability of language models to maintain performance on inputs longer than those seen during pretraining. In this work, we introduce a simple yet powerful position encoding (PE) strategy, Random Float Sampling (RFS), that generalizes well to lengths unseen during pretraining or fine-tuning. In particular, instead of selecting position indices from a predefined discrete set, RFS uses randomly sampled continuous values, thereby avoiding out-of-distribution (OOD) issues on unseen lengths by exposing the model to diverse indices during training. Since assigning indices to tokens is a common and fundamental procedure in widely used PEs, the advantage of RFS can easily be incorporated into, for instance, the absolute sinusoidal encoding, RoPE, and ALiBi. Experiments corroborate its effectiveness by showing that RFS results in superior performance in length generalization tasks as well as zero-shot commonsense reasoning benchmarks. Comments: Subjects: Machine Learning (cs.LG) Cite as: arXiv:2602.14050 [cs.LG]   (or arXiv...

Related Articles

Llms

[D] Tested model routing on financial AI datasets — good savings and curious what benchmarks others use.

Ran a benchmark evaluating whether prompt complexity-based routing delivers meaningful savings. Used public HuggingFace datasets. Here's ...

Reddit - Machine Learning · 1 min ·
Llms

[D] AI research on small language models

i'm doing research on some trending fields in AI, currently working on small language models and would love to meet people who are workin...

Reddit - Machine Learning · 1 min ·
Llms

One of The Worst AI's I've Ever Seen

I'm using Gemini just for they gave us a student-free-pro pack. It can't see the images I sent, most of the time it just rewrites the mes...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone 👋 I've set up a self-hosted API gateway using New-API to manage and distribute Claude Opus 4.6 access across multiple users....

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime