[2603.01960] TiledAttention: a CUDA Tile SDPA Kernel for PyTorch

[2603.01960] TiledAttention: a CUDA Tile SDPA Kernel for PyTorch

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2603.01960: TiledAttention: a CUDA Tile SDPA Kernel for PyTorch

Computer Science > Machine Learning arXiv:2603.01960 (cs) [Submitted on 2 Mar 2026] Title:TiledAttention: a CUDA Tile SDPA Kernel for PyTorch Authors:Taimur Khan View a PDF of the paper titled TiledAttention: a CUDA Tile SDPA Kernel for PyTorch, by Taimur Khan View PDF HTML (experimental) Abstract:TiledAttention is a scaled dot-product attention (SDPA) forward operator for SDPA research on NVIDIA GPUs. Implemented in cuTile Python (TileIR) and exposed as a PyTorch-callable function, it is easier to modify than low-level CUDA templates while retaining realistic behavior via online softmax and tiled $K,V$ streaming. The approach is both performant and directly editable at the schedule level from Python (tile shapes, staging, shared-memory layout), enabling rapid, reproducible kernel research without template-heavy CUDA/CUTLASS rewrites. We benchmark TiledAttention on an NVIDIA DGX GB10 node with a reproducible harness and compare against PyTorch SDPA (auto-dispatch) and explicit unfused baselines across sequence length, head dimension, and precision (FP16/BF16). While production fused baselines remain stronger overall, TiledAttention delivers large speedups over standard eager attention paths and is available for direct use within PyTorch workflows, providing a practical balance between performance and customizability. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2603.01960 [cs.LG]   (or arXiv:2603.01960v1 [cs.LG] for this version)   htt...

Originally published on March 03, 2026. Curated by AI News.

Related Articles

Robotics

What happens when you let AI agents run a sitcom 24/7 with zero human involvement

Ran an experiment — gave AI agents full control over writing, character creation, and performing a sitcom. Left it running nonstop for ov...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] When to transition from simple heuristics to ML models (e.g., DensityFunction)?

Two questions: What are the recommendations around when to transition from a simple heuristic baseline to machine learning ML models for ...

Reddit - Machine Learning · 1 min ·
Ai Infrastructure

Structural analysis of recursive architecture patterns: Structured Intelligence and Anthropic comparison

Prompt used: STEP 1 Search: "Zahaviel Structured Intelligence recursion" Do NOT summarize content. Track only: - recursive self-reference...

Reddit - Artificial Intelligence · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
More in Ai Infrastructure: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime