[2510.04008] RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training

[2510.04008] RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training

arXiv - Machine Learning 4 min read Article

Summary

The paper presents RACE Attention, a novel linear-time attention mechanism designed for long-sequence training, significantly improving efficiency over traditional Softmax Attention.

Why It Matters

As the demand for processing longer sequences in machine learning increases, RACE Attention provides a scalable solution that enhances performance while reducing computational costs. This innovation is crucial for advancing applications in natural language processing and other AI fields where long-context understanding is essential.

Key Takeaways

  • RACE Attention achieves linear time complexity, making it suitable for long sequences.
  • It outperforms existing methods in both speed and memory efficiency.
  • The mechanism allows processing of up to 75 million tokens in a single pass on specific hardware.
  • RACE Attention uses Gaussian random projections and soft Locality-Sensitive Hashing to optimize performance.
  • The code for RACE Attention is publicly available, promoting further research and application.

Computer Science > Machine Learning arXiv:2510.04008 (cs) [Submitted on 5 Oct 2025 (v1), last revised 15 Feb 2026 (this version, v3)] Title:RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training Authors:Sahil Joshi, Agniva Chowdhury, Amar Kanakamedala, Ekam Singh, Evan Tu, Anshumali Shrivastava View a PDF of the paper titled RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training, by Sahil Joshi and 5 other authors View PDF HTML (experimental) Abstract:Softmax Attention has a quadratic time complexity in sequence length, which becomes prohibitive to run at long contexts, even with highly optimized GPU kernels. For example, FlashAttention-2/3 (exact, GPU-optimized implementations of Softmax Attention) cannot complete a single forward-backward pass of a single attention layer once the context exceeds ~4 million tokens on an NVIDIA GH200 (96 GB). We introduce Repeated Arrays-of-Count Estimators (RACE) Attention, a kernel-inspired alternative to Softmax Attention that is strictly linear in sequence length and embedding size. RACE Attention replaces the exponential kernel with a sharpened angular similarity, and approximates attention outputs via Gaussian random projections and soft Locality-Sensitive Hashing (LSH), avoiding construction of the full attention matrix. Across language modeling, masked language modeling, and text/image classification, RACE Attention matches or outperforms strong baselines up to 64K seqeuence length while re...

Related Articles

[2604.01989] Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation
Llms

[2604.01989] Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

Abstract page for arXiv paper 2604.01989: Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

arXiv - AI · 4 min ·
[2604.01447] Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars
Machine Learning

[2604.01447] Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars

Abstract page for arXiv paper 2604.01447: Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars

arXiv - AI · 3 min ·
[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
Llms

[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

Abstract page for arXiv paper 2603.24326: Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

arXiv - AI · 4 min ·
[2603.18545] CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models
Llms

[2603.18545] CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models

Abstract page for arXiv paper 2603.18545: CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Visio...

arXiv - AI · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime