[2603.01960] TiledAttention: a CUDA Tile SDPA Kernel for PyTorch

arXiv - Machine Learning March 03, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.01960: TiledAttention: a CUDA Tile SDPA Kernel for PyTorch

Computer Science > Machine Learning arXiv:2603.01960 (cs) [Submitted on 2 Mar 2026] Title:TiledAttention: a CUDA Tile SDPA Kernel for PyTorch Authors:Taimur Khan View a PDF of the paper titled TiledAttention: a CUDA Tile SDPA Kernel for PyTorch, by Taimur Khan View PDF HTML (experimental) Abstract:TiledAttention is a scaled dot-product attention (SDPA) forward operator for SDPA research on NVIDIA GPUs. Implemented in cuTile Python (TileIR) and exposed as a PyTorch-callable function, it is easier to modify than low-level CUDA templates while retaining realistic behavior via online softmax and tiled $K,V$ streaming. The approach is both performant and directly editable at the schedule level from Python (tile shapes, staging, shared-memory layout), enabling rapid, reproducible kernel research without template-heavy CUDA/CUTLASS rewrites. We benchmark TiledAttention on an NVIDIA DGX GB10 node with a reproducible harness and compare against PyTorch SDPA (auto-dispatch) and explicit unfused baselines across sequence length, head dimension, and precision (FP16/BF16). While production fused baselines remain stronger overall, TiledAttention delivers large speedups over standard eager attention paths and is available for direct use within PyTorch workflows, providing a practical balance between performance and customizability. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2603.01960 [cs.LG] (or arXiv:2603.01960v1 [cs.LG] for this version) htt...

Originally published on March 03, 2026. Curated by AI News.

Robotics

What happens when you let AI agents run a sitcom 24/7 with zero human involvement

Ran an experiment — gave AI agents full control over writing, character creation, and performing a sitcom. Left it running nonstop for ov...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Machine Learning

[D] When to transition from simple heuristics to ML models (e.g., DensityFunction)?

Two questions: What are the recommendations around when to transition from a simple heuristic baseline to machine learning ML models for ...

Reddit - Machine Learning · 1 min · about 2 hours ago

Ai Infrastructure

Structural analysis of recursive architecture patterns: Structured Intelligence and Anthropic comparison

Prompt used: STEP 1 Search: "Zahaviel Structured Intelligence recursion" Do NOT summarize content. Track only: - recursive self-reference...

Reddit - Artificial Intelligence · 1 min · about 8 hours ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 13 hours ago

[2603.01960] TiledAttention: a CUDA Tile SDPA Kernel for PyTorch

About this article

Related Articles

What happens when you let AI agents run a sitcom 24/7 with zero human involvement

[D] When to transition from simple heuristics to ML models (e.g., DensityFunction)?

Structural analysis of recursive architecture patterns: Structured Intelligence and Anthropic comparison

UMKC Announces New Master of Science in Artificial Intelligence

No comments

Stay updated with AI News