[2603.18062] S3T-Former: A Purely Spike-Driven State-Space Topology Transformer for Skeleton Action Recognition
About this article
Abstract page for arXiv paper 2603.18062: S3T-Former: A Purely Spike-Driven State-Space Topology Transformer for Skeleton Action Recognition
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.18062 (cs) [Submitted on 18 Mar 2026 (v1), last revised 20 Mar 2026 (this version, v2)] Title:S3T-Former: A Purely Spike-Driven State-Space Topology Transformer for Skeleton Action Recognition Authors:Naichuan Zheng, Hailun Xia, Zepeng Sun, Weiyi Li, Yujia Wang View a PDF of the paper titled S3T-Former: A Purely Spike-Driven State-Space Topology Transformer for Skeleton Action Recognition, by Naichuan Zheng and 4 other authors View PDF HTML (experimental) Abstract:Skeleton-based action recognition is crucial for multimedia applications but heavily relies on power-hungry Artificial Neural Networks (ANNs), limiting their deployment on resource-constrained edge devices. Spiking Neural Networks (SNNs) provide an energy-efficient alternative; however, existing spiking models for skeleton data often compromise the intrinsic sparsity of SNNs by resorting to dense matrix aggregations, heavy multimodal fusion modules, or non-sparse frequency domain transformations. Furthermore, they severely suffer from the short-term amnesia of spiking neurons. In this paper, we propose the Spiking State-Space Topology Transformer (S3T-Former), which, to the best of our knowledge, is the first purely spike-driven Transformer architecture specifically designed for energy-efficient skeleton action recognition. Rather than relying on heavy fusion overhead, we formulate a Multi-Stream Anatomical Spiking Embedding (M-ASE) that acts a...