[2602.17162] JEPA-DNA: Grounding Genomic Foundation Models through Joint-Embedding Predictive Architectures
Summary
The paper introduces JEPA-DNA, a novel framework for genomic foundation models that enhances predictive capabilities by integrating joint-embedding architectures, improving biological context understanding.
Why It Matters
JEPA-DNA addresses limitations in existing genomic models by providing a more comprehensive understanding of genomic sequences. This advancement is crucial for applications in genomics, where accurate interpretation of genetic data can lead to significant breakthroughs in medicine and biology.
Key Takeaways
- JEPA-DNA integrates joint-embedding predictive architectures with traditional generative objectives.
- The framework enhances the understanding of genomic sequences beyond local syntax to include broader functional contexts.
- JEPA-DNA can be used as a standalone model or as an enhancement for existing genomic foundation models.
- Evaluations show superior performance in both supervised and zero-shot tasks compared to generative-only models.
- This approach offers a scalable path for developing foundation models that grasp the functional logic of genomic sequences.
Computer Science > Artificial Intelligence arXiv:2602.17162 (cs) [Submitted on 19 Feb 2026] Title:JEPA-DNA: Grounding Genomic Foundation Models through Joint-Embedding Predictive Architectures Authors:Ariel Larey, Elay Dahan, Amit Bleiweiss, Raizy Kellerman, Guy Leib, Omri Nayshool, Dan Ofer, Tal Zinger, Dan Dominissini, Gideon Rechavi, Nicole Bussola, Simon Lee, Shane O'Connell, Dung Hoang, Marissa Wirth, Alexander W. Charney, Nati Daniel, Yoli Shavit View a PDF of the paper titled JEPA-DNA: Grounding Genomic Foundation Models through Joint-Embedding Predictive Architectures, by Ariel Larey and 17 other authors View PDF Abstract:Genomic Foundation Models (GFMs) have largely relied on Masked Language Modeling (MLM) or Next Token Prediction (NTP) to learn the language of life. While these paradigms excel at capturing local genomic syntax and fine-grained motif patterns, they often fail to capture the broader functional context, resulting in representations that lack a global biological perspective. We introduce JEPA-DNA, a novel pre-training framework that integrates the Joint-Embedding Predictive Architecture (JEPA) with traditional generative objectives. JEPA-DNA introduces latent grounding by coupling token-level recovery with a predictive objective in the latent space by supervising a CLS token. This forces the model to predict the high-level functional embeddings of masked genomic segments rather than focusing solely on individual nucleotides. JEPA-DNA extends both NTP...