Ai Infrastructure Machine Learning Open Source Ai

[2602.17206] SoftDTW-CUDA-Torch: Memory-Efficient GPU-Accelerated Soft Dynamic Time Warping for PyTorch

arXiv - Machine Learning February 20, 2026 3 min read Article

Summary

The paper presents SoftDTW-CUDA-Torch, an open-source PyTorch library that enhances Soft Dynamic Time Warping (SoftDTW) by improving memory efficiency and numerical stability on GPUs.

Why It Matters

This development is significant for researchers and practitioners in machine learning, particularly those working with time series data, as it addresses critical limitations of existing SoftDTW implementations, enabling more efficient computations and broader applicability in various applications.

Key Takeaways

Introduces a memory-efficient implementation of SoftDTW for PyTorch.
Eliminates the hard sequence-length cap of 1024 through tiled kernel execution.
Prevents numerical instability with a log-space backward pass.
Achieves up to 98% memory reduction compared to previous methods.
Supports arbitrary sequence lengths and full PyTorch autograd integration.

Computer Science > Machine Learning arXiv:2602.17206 (cs) [Submitted on 19 Feb 2026] Title:SoftDTW-CUDA-Torch: Memory-Efficient GPU-Accelerated Soft Dynamic Time Warping for PyTorch Authors:Ron Shapira Weber, Oren Freifeld View a PDF of the paper titled SoftDTW-CUDA-Torch: Memory-Efficient GPU-Accelerated Soft Dynamic Time Warping for PyTorch, by Ron Shapira Weber and 1 other authors View PDF HTML (experimental) Abstract:We present softdtw-cuda-torch, an open-source PyTorch library for computing Soft Dynamic Time Warping (SoftDTW) on GPUs. Our implementation addresses three key limitations of existing GPU implementations of SoftDTW: a hard sequence-length cap of 1024, numerical instability in the backward pass for small smoothing parameters, and excessive GPU memory consumption from materializing pairwise distance tensors. We introduce (1) tiled anti-diagonal kernel execution that removes the sequence-length constraint, (2) a log-space back-ward pass that prevents floating-point overflow, and (3) a fused distance-computation mode that eliminates the O(BN M ) intermediate distance tensor, achieving up to 98% memory reduction compared to prior work. The library supports arbitrary sequence lengths, full PyTorch autograd integration, and Soft-DTW Barycenter computation. Code is available at this https URL. Comments: Subjects: Machine Learning (cs.LG) Cite as: arXiv:2602.17206 [cs.LG] (or arXiv:2602.17206v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.17206...

Read Original Article

Ai Infrastructure

[P] Built an open source tool to find the location of any street picture

Hey guys, Thank you so much for your love and support regarding Netryx Astra V2 last time. Many people are not that technically savvy to ...

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The...

Reddit - Machine Learning · 1 min · about 5 hours ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 6 hours ago

Machine Learning

[R] First open-source implementation of Hebbian fast-weight write-back for the BDH architecture

The BDH (Dragon Hatchling) paper (arXiv:2509.26507) describes a Hebbian synaptic plasticity mechanism where model weights update during i...

Reddit - Machine Learning · 1 min · about 7 hours ago