[2510.04008] RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training
Summary
The paper presents RACE Attention, a novel linear-time attention mechanism designed for long-sequence training, significantly improving efficiency over traditional Softmax Attention.
Why It Matters
As the demand for processing longer sequences in machine learning increases, RACE Attention provides a scalable solution that enhances performance while reducing computational costs. This innovation is crucial for advancing applications in natural language processing and other AI fields where long-context understanding is essential.
Key Takeaways
- RACE Attention achieves linear time complexity, making it suitable for long sequences.
- It outperforms existing methods in both speed and memory efficiency.
- The mechanism allows processing of up to 75 million tokens in a single pass on specific hardware.
- RACE Attention uses Gaussian random projections and soft Locality-Sensitive Hashing to optimize performance.
- The code for RACE Attention is publicly available, promoting further research and application.
Computer Science > Machine Learning arXiv:2510.04008 (cs) [Submitted on 5 Oct 2025 (v1), last revised 15 Feb 2026 (this version, v3)] Title:RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training Authors:Sahil Joshi, Agniva Chowdhury, Amar Kanakamedala, Ekam Singh, Evan Tu, Anshumali Shrivastava View a PDF of the paper titled RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training, by Sahil Joshi and 5 other authors View PDF HTML (experimental) Abstract:Softmax Attention has a quadratic time complexity in sequence length, which becomes prohibitive to run at long contexts, even with highly optimized GPU kernels. For example, FlashAttention-2/3 (exact, GPU-optimized implementations of Softmax Attention) cannot complete a single forward-backward pass of a single attention layer once the context exceeds ~4 million tokens on an NVIDIA GH200 (96 GB). We introduce Repeated Arrays-of-Count Estimators (RACE) Attention, a kernel-inspired alternative to Softmax Attention that is strictly linear in sequence length and embedding size. RACE Attention replaces the exponential kernel with a sharpened angular similarity, and approximates attention outputs via Gaussian random projections and soft Locality-Sensitive Hashing (LSH), avoiding construction of the full attention matrix. Across language modeling, masked language modeling, and text/image classification, RACE Attention matches or outperforms strong baselines up to 64K seqeuence length while re...