[2511.10696] $π$-Attention: Periodic Sparse Transformers for Efficient

[2511.10696] $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling

arXiv - AI March 31, 2026 4 min read

About this article

Abstract page for arXiv paper 2511.10696: $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling

Computer Science > Computation and Language arXiv:2511.10696 (cs) [Submitted on 12 Nov 2025 (v1), last revised 28 Mar 2026 (this version, v2)] Title:$π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling Authors:Dong Liu, Yanxuan Yu View a PDF of the paper titled $\pi$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling, by Dong Liu and 1 other authors View PDF HTML (experimental) Abstract:Transformers have revolutionized natural language processing, but their quadratic complexity with respect to sequence length remains a fundamental bottleneck for long-range modeling. While sparse attention mechanisms like RingAttention reduce computational costs by restricting attention to local neighborhoods, they suffer from limited receptive fields and lack of adaptability. We present \PiAttention, a periodic sparse Transformer that factorizes attention into ring-local neighborhoods, deterministic $\pi$-stride skips, and an adaptive fusion gate. The periodic structure provides predictable coverage of distant tokens, while the sparse footprint keeps the per-layer complexity linear in context length. We prove that \PiAttention achieves $\mathcal{O}(kL + \pi \log L)$ receptive field growth compared to $\mathcal{O}(kL)$ for RingAttention, where $k$ is the local window size, $\pi$ is the skip period, and $L$ is the sequence length. Extensive experiments on language modeling, retrieval, and vision-language tasks demonstrate that \PiAttention ...

Originally published on March 31, 2026. Curated by AI News.

Llms

[P] Remote sensing foundation models made easy to use.

This project enables the idea of tasking remote sensing models to acquire embeddings like we task satellites to acquire data! https://git...

Reddit - Machine Learning · 1 min · 15 minutes ago

Machine Learning

Can AI truly be creative?

AI has no imagination. “Creativity is the ability to generate novel and valuable ideas or works through the exercise of imagination” http...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Machine Learning

AI video generation seems fundamentally more expensive than text, not just less optimized

There’s been a lot of discussion recently about how expensive AI video generation is compared to text, and it feels like this is more tha...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Machine Learning

[D] When to transition from simple heuristics to ML models (e.g., DensityFunction)?

Two questions: What are the recommendations around when to transition from a simple heuristic baseline to machine learning ML models for ...

Reddit - Machine Learning · 1 min · about 2 hours ago

[2511.10696] $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling

About this article

Related Articles

[P] Remote sensing foundation models made easy to use.

Can AI truly be creative?

AI video generation seems fundamentally more expensive than text, not just less optimized

[D] When to transition from simple heuristics to ML models (e.g., DensityFunction)?

No comments

Stay updated with AI News