[2509.25095] Benchmarking ECG FMs: A Reality Check Across Clinical

[2509.25095] Benchmarking ECG FMs: A Reality Check Across Clinical Tasks

arXiv - Machine Learning March 05, 2026 4 min read

About this article

Abstract page for arXiv paper 2509.25095: Benchmarking ECG FMs: A Reality Check Across Clinical Tasks

Electrical Engineering and Systems Science > Signal Processing arXiv:2509.25095 (eess) [Submitted on 29 Sep 2025 (v1), last revised 4 Mar 2026 (this version, v2)] Title:Benchmarking ECG FMs: A Reality Check Across Clinical Tasks Authors:M A Al-Masud, Juan Miguel Lopez Alcaraz, Nils Strodthoff View a PDF of the paper titled Benchmarking ECG FMs: A Reality Check Across Clinical Tasks, by M A Al-Masud and 2 other authors View PDF HTML (experimental) Abstract:The 12-lead electrocardiogram (ECG) is a long-standing diagnostic tool. Yet machine learning for ECG interpretation remains fragmented, often limited to narrow tasks or datasets. FMs promise broader adaptability, but fundamental questions remain: Which architectures generalize best? How do models scale with limited labels? What explains performance differences across model families? We benchmarked eight ECG FMs on 26 clinically relevant tasks using 12 public datasets comprising 1,650 regression and classification targets. Models were evaluated under fine-tuning and frozen settings, with scaling analyses across dataset sizes. Results show heterogeneous performance across domains: in adult ECG interpretation, three FMs consistently outperformed strong supervised baselines. In contrast, ECG-CPC, a compact structured state-space model, dominated 5 of 7 task categories, demonstrating that architecture matters more than scale. FMs improved label efficiency 3.3-9x over supervised baselines, though scaling behaviors varied across...

Originally published on March 05, 2026. Curated by AI News.

Machine Learning

[R] I trained a 3k parameter model on XOR sequences of length 20. It extrapolates perfectly to length 1,000,000. Here's why I think that's architecturally significant.

I've been working on an alternative to attention-based sequence modeling that I'm calling Geometric Flow Networks (GFN). The core idea: i...

Reddit - Machine Learning · 1 min · 11 minutes ago

Llms

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

Inspired by Andrej Karpathy's AutoResearch, I built a system where Claude Code acts as an autonomous ML researcher on tabular binary clas...

Reddit - Machine Learning · 1 min · about 4 hours ago

Machine Learning

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

Hi, r/MachineLearning: has much research been done in large-scale training scenarios where undesirable data has been replaced before trai...

Reddit - Machine Learning · 1 min · about 4 hours ago

Llms

[R] BraiNN: An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning

BraiNN An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning BraiNN is a compact research‑...

Reddit - Machine Learning · 1 min · about 5 hours ago

[2509.25095] Benchmarking ECG FMs: A Reality Check Across Clinical Tasks

About this article

Related Articles

[R] I trained a 3k parameter model on XOR sequences of length 20. It extrapolates perfectly to length 1,000,000. Here's why I think that's architecturally significant.

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

[R] BraiNN: An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning

No comments

Stay updated with AI News