[2602.21550] Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction

[2602.21550] Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction

arXiv - Machine Learning 4 min read Article

Summary

This article presents a novel approach to gene expression prediction by integrating multimodal epigenomic signals, challenging the reliance on extended sequence lengths in current models.

Why It Matters

Gene expression prediction is crucial for understanding biological processes and diseases. This research highlights the limitations of traditional methods that focus on long sequences, suggesting that integrating proximal multimodal signals can enhance predictive performance, thus advancing genomic research and applications in personalized medicine.

Key Takeaways

  • Long sequence modeling can degrade performance in gene expression prediction.
  • Proximal multimodal epigenomic signals are more critical than previously thought.
  • The proposed Prism framework effectively integrates high-dimensional epigenomic features.
  • Proper modeling can achieve state-of-the-art results using only short sequences.
  • Understanding distinct biological roles of signal types can mitigate confounding effects.

Computer Science > Machine Learning arXiv:2602.21550 (cs) [Submitted on 25 Feb 2026] Title:Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction Authors:Zhao Yang, Yi Duan, Jiwei Zhu, Ying Ba, Chuan Cao, Bing Su View a PDF of the paper titled Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction, by Zhao Yang and 5 other authors View PDF HTML (experimental) Abstract:Gene expression prediction, which predicts mRNA expression levels from DNA sequences, presents significant challenges. Previous works often focus on extending input sequence length to locate distal enhancers, which may influence target genes from hundreds of kilobases away. Our work first reveals that for current models, long sequence modeling can decrease performance. Even carefully designed algorithms only mitigate the performance degradation caused by long sequences. Instead, we find that proximal multimodal epigenomic signals near target genes prove more essential. Hence we focus on how to better integrate these signals, which has been overlooked. We find that different signal types serve distinct biological roles, with some directly marking active regulatory elements while others reflect background chromatin patterns that may introduce confounding effects. Simple concatenation may lead models to develop spurious associations with these background patterns. To address this ...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
Llms

LLM agents can trigger real actions now. But what actually stops them from executing?

We ran into a simple but important issue while building agents with tool calling: the model can propose actions but nothing actually enfo...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

OkCupid gave 3 million dating-app photos to facial recognition firm, FTC says

submitted by /u/Mathemodel [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime