[2604.00058] GenoBERT: A Language Model for Accurate Genotype Imputation

[2604.00058] GenoBERT: A Language Model for Accurate Genotype Imputation

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2604.00058: GenoBERT: A Language Model for Accurate Genotype Imputation

Quantitative Biology > Genomics arXiv:2604.00058 (q-bio) [Submitted on 31 Mar 2026] Title:GenoBERT: A Language Model for Accurate Genotype Imputation Authors:Lei Huang, Chuan Qiu, Kuan-Jui Su, Anqi Liu, Yun Gong, Weiqiang Lin, Lindong Jiang, Chen Zhao, Meng Song, Jeffrey Deng, Qing Tian, Zhe Luo, Ping Gong, Hui Shen, Chaoyang Zhang, Hong-Wen Deng View a PDF of the paper titled GenoBERT: A Language Model for Accurate Genotype Imputation, by Lei Huang and 15 other authors View PDF Abstract:Genotype imputation enables dense variant coverage for genome-wide association and risk-prediction studies, yet conventional reference-panel methods remain limited by ancestry bias and reduced rare-variant accuracy. We present Genotype Bidirectional Encoder Representations from Transformers (GenoBERT), a transformer-based, reference-free framework that tokenizes phased genotypes and uses a self-attention mechanism to capture both short- and long-range linkage disequilibrium (LD) dependencies. Benchmarking on two independent datasets including the Louisiana Osteoporosis Study (LOS) and the 1000 Genomes Project (1KGP) across ancestry groups and multiple genotype missingness levels (5-50%) shows that GenoBERT achieves the highest overall accuracy compared to four baseline methods (Beagle5.4, SCDA, BiU-Net, and STICI). At practical sparsity levels (up to 25% missing), GenoBERT attains high overall imputation accuracy ($r^2 approx 0.98$) across datasets, and maintains robust performance ($r^2 >...

Originally published on April 02, 2026. Curated by AI News.

Related Articles

Llms

[D] thoughts on current community moving away from heavy math?

I don't know about how you guys feel but even before LLM started, many papers are already leaning on empirical findings, architecture des...

Reddit - Machine Learning · 1 min ·
Gemini is making it faster for distressed users to reach mental health resources  | The Verge
Llms

Gemini is making it faster for distressed users to reach mental health resources  | The Verge

The update follows a wrongful death lawsuit alleging Gemini ‘coached’ a man to die by suicide.

The Verge - AI · 4 min ·
Anthropic Claude AI training model targets AI skills gap | ETIH EdTech News
Llms

Anthropic Claude AI training model targets AI skills gap | ETIH EdTech News

AI in education, edtech AI tools, and AI skills training drive Anthropic’s Claude curriculum. ETIH edtech news covers how AI fluency, wor...

AI Tools & Products · 6 min ·
I use ChatGPT every day — I stick to these 3 rules to protect my privacy
Llms

I use ChatGPT every day — I stick to these 3 rules to protect my privacy

I stick to three essential rules whenever I open up a new chat in ChatGPT to always protect my privacy and keep my data secure

AI Tools & Products · 9 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime