[2602.10603] dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning

[2602.10603] dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning

arXiv - Machine Learning 4 min read Article

Summary

The paper presents dnaHNet, a novel tokenizer-free autoregressive model designed for genomic sequence learning, achieving significant efficiency and performance improvements over existing models.

Why It Matters

As genomic data continues to grow, efficient models like dnaHNet are crucial for advancing bioinformatics. This model addresses key challenges in genomic sequence representation, potentially accelerating research in genetics and molecular biology.

Key Takeaways

  • dnaHNet introduces a tokenizer-free approach for genomic sequences.
  • The model employs a dynamic chunking mechanism for improved efficiency.
  • It outperforms existing models in both speed and predictive accuracy.
  • Achieves significant reductions in computational costs, enhancing scalability.
  • Demonstrates superior performance in zero-shot genomic tasks.

Computer Science > Machine Learning arXiv:2602.10603 (cs) [Submitted on 11 Feb 2026 (v1), last revised 14 Feb 2026 (this version, v2)] Title:dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning Authors:Arnav Shah, Junzhe Li, Parsa Idehpour, Adibvafa Fallahpour, Brandon Wang, Sukjun Hwang, Bo Wang, Patrick D. Hsu, Hani Goodarzi, Albert Gu View a PDF of the paper titled dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning, by Arnav Shah and 9 other authors View PDF HTML (experimental) Abstract:Genomic foundation models have the potential to decode DNA syntax, yet face a fundamental tradeoff in their input representation. Standard fixed-vocabulary tokenizers fragment biologically meaningful motifs such as codons and regulatory elements, while nucleotide-level models preserve biological coherence but incur prohibitive computational costs for long contexts. We introduce dnaHNet, a state-of-the-art tokenizer-free autoregressive model that segments and models genomic sequences end-to-end. Using a differentiable dynamic chunking mechanism, dnaHNet compresses raw nucleotides into latent tokens adaptively, balancing compression with predictive accuracy. Pretrained on prokaryotic genomes, dnaHNet outperforms leading architectures including StripedHyena2 in scaling and efficiency. This recursive chunking yields quadratic FLOP reductions, enabling $>3 \times$ inference speedup over Transformers. On zero-shot tasks, dnaHNet a...

Related Articles

[2604.01989] Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation
Llms

[2604.01989] Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

Abstract page for arXiv paper 2604.01989: Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

arXiv - AI · 4 min ·
[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
Llms

[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

Abstract page for arXiv paper 2603.24326: Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

arXiv - AI · 4 min ·
[2603.18545] CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models
Llms

[2603.18545] CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models

Abstract page for arXiv paper 2603.18545: CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Visio...

arXiv - AI · 4 min ·
[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?
Llms

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

Abstract page for arXiv paper 2509.22367: What Is The Political Content in LLMs' Pre- and Post-Training Data?

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime