Machine Learning Ai Infrastructure Nlp

[2510.04008] RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training

arXiv - Machine Learning February 17, 2026 4 min read Article

Summary

The paper presents RACE Attention, a novel linear-time attention mechanism designed for long-sequence training, significantly improving efficiency over traditional Softmax Attention.

Why It Matters

As the demand for processing longer sequences in machine learning increases, RACE Attention provides a scalable solution that enhances performance while reducing computational costs. This innovation is crucial for advancing applications in natural language processing and other AI fields where long-context understanding is essential.

Key Takeaways

RACE Attention achieves linear time complexity, making it suitable for long sequences.
It outperforms existing methods in both speed and memory efficiency.
The mechanism allows processing of up to 75 million tokens in a single pass on specific hardware.
RACE Attention uses Gaussian random projections and soft Locality-Sensitive Hashing to optimize performance.
The code for RACE Attention is publicly available, promoting further research and application.

Computer Science > Machine Learning arXiv:2510.04008 (cs) [Submitted on 5 Oct 2025 (v1), last revised 15 Feb 2026 (this version, v3)] Title:RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training Authors:Sahil Joshi, Agniva Chowdhury, Amar Kanakamedala, Ekam Singh, Evan Tu, Anshumali Shrivastava View a PDF of the paper titled RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training, by Sahil Joshi and 5 other authors View PDF HTML (experimental) Abstract:Softmax Attention has a quadratic time complexity in sequence length, which becomes prohibitive to run at long contexts, even with highly optimized GPU kernels. For example, FlashAttention-2/3 (exact, GPU-optimized implementations of Softmax Attention) cannot complete a single forward-backward pass of a single attention layer once the context exceeds ~4 million tokens on an NVIDIA GH200 (96 GB). We introduce Repeated Arrays-of-Count Estimators (RACE) Attention, a kernel-inspired alternative to Softmax Attention that is strictly linear in sequence length and embedding size. RACE Attention replaces the exponential kernel with a sharpened angular similarity, and approximates attention outputs via Gaussian random projections and soft Locality-Sensitive Hashing (LSH), avoiding construction of the full attention matrix. Across language modeling, masked language modeling, and text/image classification, RACE Attention matches or outperforms strong baselines up to 64K seqeuence length while re...

Read Original Article

[2510.04008] RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training

Summary

Why It Matters

Key Takeaways

Related Articles

[2604.01989] Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

[2604.01447] Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars

[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

[2603.18545] CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models

No comments

Stay updated with AI News