[2603.29913] SISA: A Scale-In Systolic Array for GEMM Acceleration

arXiv - AI April 01, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.29913: SISA: A Scale-In Systolic Array for GEMM Acceleration

Computer Science > Hardware Architecture arXiv:2603.29913 (cs) [Submitted on 31 Mar 2026] Title:SISA: A Scale-In Systolic Array for GEMM Acceleration Authors:Luigi Altamura, Alessio Cicero, Mateo Vázquez Maceiras, Mohammad Ali Maleki, Pedro Trancoso View a PDF of the paper titled SISA: A Scale-In Systolic Array for GEMM Acceleration, by Luigi Altamura and 4 other authors View PDF HTML (experimental) Abstract:The currently dominant AI/ML workloads, such as Large Language Models (LLMs), rely on the efficient execution of General Matrix-Matrix Multiplication (GEMM) operations. Thus, most systems are equipped with dedicated matrix hardware accelerators based on square Systolic Arrays (SAs) of Processing Elements (PEs). While this organization was effective for traditional Deep Neural Networks (DNNs), LLMs introduce input-dependent and highly skewed matrices, leading to underutilized SA resources. To address this challenge, we propose SISA (Scale-In Systolic Array), a novel SA architecture that partitions the traditional square array into horizontal rectangular slabs. With minimal overhead, SISA exposes parallelism through independently scheduled slabs for efficient execution of small or skewed matrix shapes, while retaining full-array operation for large GEMMs. SISA achieves up to 8.52x speedup and 93% energy-delay-product (EDP) reduction for representative LLMs compared to a state-of-the-art monolithic SA with the same number of PEs. Subjects: Hardware Architecture (cs.AR); A...

Originally published on April 01, 2026. Curated by AI News.

Llms

Gemma 4 actually running usable on an Android phone (not llama.cpp)

I wanted a real local assistant on my phone, not a demo. First tried the usual llama.cpp in Termux — Gemma 4 was 2–3 tok/s and the phone ...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

Claude vs Gemini: Solving the laden knight's tour problem

AI Coding contest day 8 The eighth challenge is a weighted variant of the classic knight's tour. The knight must visit every square of a ...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

AI helped me build a custom PC and 4 apps in 6 months with zero coding experience

Mid-October, early morning at work. I was hunting for a podcast to throw on while I worked and stumbled into something about what AI coul...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

I thought of something while cooking up a simple RL AI. Please Validate it. [R]

So, I was trying to build a simple AI when I thought of, 'How could I give an AI some emotions? ' This led to one thing after another, an...

Reddit - Machine Learning · 1 min · about 7 hours ago

[2603.29913] SISA: A Scale-In Systolic Array for GEMM Acceleration

About this article

Related Articles

Gemma 4 actually running usable on an Android phone (not llama.cpp)

Claude vs Gemini: Solving the laden knight's tour problem

AI helped me build a custom PC and 4 apps in 6 months with zero coding experience

I thought of something while cooking up a simple RL AI. Please Validate it. [R]

No comments

Stay updated with AI News