[2604.00058] GenoBERT: A Language Model for Accurate Genotype

[2604.00058] GenoBERT: A Language Model for Accurate Genotype Imputation

arXiv - AI April 02, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.00058: GenoBERT: A Language Model for Accurate Genotype Imputation

Quantitative Biology > Genomics arXiv:2604.00058 (q-bio) [Submitted on 31 Mar 2026] Title:GenoBERT: A Language Model for Accurate Genotype Imputation Authors:Lei Huang, Chuan Qiu, Kuan-Jui Su, Anqi Liu, Yun Gong, Weiqiang Lin, Lindong Jiang, Chen Zhao, Meng Song, Jeffrey Deng, Qing Tian, Zhe Luo, Ping Gong, Hui Shen, Chaoyang Zhang, Hong-Wen Deng View a PDF of the paper titled GenoBERT: A Language Model for Accurate Genotype Imputation, by Lei Huang and 15 other authors View PDF Abstract:Genotype imputation enables dense variant coverage for genome-wide association and risk-prediction studies, yet conventional reference-panel methods remain limited by ancestry bias and reduced rare-variant accuracy. We present Genotype Bidirectional Encoder Representations from Transformers (GenoBERT), a transformer-based, reference-free framework that tokenizes phased genotypes and uses a self-attention mechanism to capture both short- and long-range linkage disequilibrium (LD) dependencies. Benchmarking on two independent datasets including the Louisiana Osteoporosis Study (LOS) and the 1000 Genomes Project (1KGP) across ancestry groups and multiple genotype missingness levels (5-50%) shows that GenoBERT achieves the highest overall accuracy compared to four baseline methods (Beagle5.4, SCDA, BiU-Net, and STICI). At practical sparsity levels (up to 25% missing), GenoBERT attains high overall imputation accuracy ($r^2 approx 0.98$) across datasets, and maintains robust performance ($r^2 >...

Originally published on April 02, 2026. Curated by AI News.

Llms

[D] thoughts on current community moving away from heavy math?

I don't know about how you guys feel but even before LLM started, many papers are already leaning on empirical findings, architecture des...

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

Gemini is making it faster for distressed users to reach mental health resources | The Verge

The update follows a wrongful death lawsuit alleging Gemini ‘coached’ a man to die by suicide.

The Verge - AI · 4 min · about 3 hours ago

Llms

Anthropic Claude AI training model targets AI skills gap | ETIH EdTech News

AI in education, edtech AI tools, and AI skills training drive Anthropic’s Claude curriculum. ETIH edtech news covers how AI fluency, wor...

AI Tools & Products · 6 min · about 4 hours ago

Llms

I use ChatGPT every day — I stick to these 3 rules to protect my privacy

I stick to three essential rules whenever I open up a new chat in ChatGPT to always protect my privacy and keep my data secure

AI Tools & Products · 9 min · about 4 hours ago

[2604.00058] GenoBERT: A Language Model for Accurate Genotype Imputation

About this article

Related Articles

[D] thoughts on current community moving away from heavy math?

Gemini is making it faster for distressed users to reach mental health resources | The Verge

Anthropic Claude AI training model targets AI skills gap | ETIH EdTech News

I use ChatGPT every day — I stick to these 3 rules to protect my privacy

No comments

Stay updated with AI News