Llms Machine Learning Nlp Data Science

[2602.10603] dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning

arXiv - Machine Learning February 17, 2026 4 min read Article

Summary

The paper presents dnaHNet, a novel tokenizer-free autoregressive model designed for genomic sequence learning, achieving significant efficiency and performance improvements over existing models.

Why It Matters

As genomic data continues to grow, efficient models like dnaHNet are crucial for advancing bioinformatics. This model addresses key challenges in genomic sequence representation, potentially accelerating research in genetics and molecular biology.

Key Takeaways

dnaHNet introduces a tokenizer-free approach for genomic sequences.
The model employs a dynamic chunking mechanism for improved efficiency.
It outperforms existing models in both speed and predictive accuracy.
Achieves significant reductions in computational costs, enhancing scalability.
Demonstrates superior performance in zero-shot genomic tasks.

Computer Science > Machine Learning arXiv:2602.10603 (cs) [Submitted on 11 Feb 2026 (v1), last revised 14 Feb 2026 (this version, v2)] Title:dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning Authors:Arnav Shah, Junzhe Li, Parsa Idehpour, Adibvafa Fallahpour, Brandon Wang, Sukjun Hwang, Bo Wang, Patrick D. Hsu, Hani Goodarzi, Albert Gu View a PDF of the paper titled dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning, by Arnav Shah and 9 other authors View PDF HTML (experimental) Abstract:Genomic foundation models have the potential to decode DNA syntax, yet face a fundamental tradeoff in their input representation. Standard fixed-vocabulary tokenizers fragment biologically meaningful motifs such as codons and regulatory elements, while nucleotide-level models preserve biological coherence but incur prohibitive computational costs for long contexts. We introduce dnaHNet, a state-of-the-art tokenizer-free autoregressive model that segments and models genomic sequences end-to-end. Using a differentiable dynamic chunking mechanism, dnaHNet compresses raw nucleotides into latent tokens adaptively, balancing compression with predictive accuracy. Pretrained on prokaryotic genomes, dnaHNet outperforms leading architectures including StripedHyena2 in scaling and efficiency. This recursive chunking yields quadratic FLOP reductions, enabling $>3 \times$ inference speedup over Transformers. On zero-shot tasks, dnaHNet a...

Read Original Article

[2602.10603] dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning

Summary

Why It Matters

Key Takeaways

Related Articles

[2604.01989] Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

[2603.18545] CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

No comments

Stay updated with AI News