[2602.19322] US-JEPA: A Joint Embedding Predictive Architecture for Medical Ultrasound
Summary
The paper presents US-JEPA, a novel self-supervised framework for medical ultrasound imaging that enhances representation learning by predicting masked latent representations, addressing challenges posed by noise and low signal-to-noise ratios.
Why It Matters
This research is significant as it introduces a new approach to improving ultrasound imaging analysis, which is crucial for accurate medical diagnoses. By overcoming limitations of existing methods, US-JEPA could enhance the reliability of ultrasound as a diagnostic tool, potentially impacting patient care and outcomes.
Key Takeaways
- US-JEPA utilizes a Static-teacher Asymmetric Latent Training (SALT) objective for stable latent target predictions.
- The framework shows competitive performance against existing ultrasound models on the UltraBench dataset.
- Masked latent prediction is proposed as a more efficient method for robust ultrasound representation learning.
- The paper provides a comprehensive comparison of state-of-the-art ultrasound models, contributing to the field's understanding.
- This research could lead to improved diagnostic capabilities in medical imaging.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.19322 (cs) [Submitted on 22 Feb 2026] Title:US-JEPA: A Joint Embedding Predictive Architecture for Medical Ultrasound Authors:Ashwath Radhachandran, Vedrana Ivezić, Shreeram Athreya, Ronit Anilkumar, Corey W. Arnold, William Speier View a PDF of the paper titled US-JEPA: A Joint Embedding Predictive Architecture for Medical Ultrasound, by Ashwath Radhachandran and 5 other authors View PDF HTML (experimental) Abstract:Ultrasound (US) imaging poses unique challenges for representation learning due to its inherently noisy acquisition process. The low signal-to-noise ratio and stochastic speckle patterns hinder standard self-supervised learning methods relying on a pixel-level reconstruction objective. Joint-Embedding Predictive Architectures (JEPAs) address this drawback by predicting masked latent representations rather than raw pixels. However, standard approaches depend on hyperparameter-brittle and computationally expensive online teachers updated via exponential moving average. We propose US-JEPA, a self-supervised framework that adopts the Static-teacher Asymmetric Latent Training (SALT) objective. By using a frozen, domain-specific teacher to provide stable latent targets, US-JEPA decouples student-teacher optimization and pushes the student to expand upon the semantic priors of the teacher. In addition, we provide the first rigorous comparison of all publicly available state-of-the-art ultrasound fo...