[2603.23361] Central Dogma Transformer III: Interpretable AI Across DNA, RNA, and Protein
About this article
Abstract page for arXiv paper 2603.23361: Central Dogma Transformer III: Interpretable AI Across DNA, RNA, and Protein
Computer Science > Machine Learning arXiv:2603.23361 (cs) [Submitted on 24 Mar 2026] Title:Central Dogma Transformer III: Interpretable AI Across DNA, RNA, and Protein Authors:Nobuyuki Ota View a PDF of the paper titled Central Dogma Transformer III: Interpretable AI Across DNA, RNA, and Protein, by Nobuyuki Ota View PDF HTML (experimental) Abstract:Biological AI models increasingly predict complex cellular responses, yet their learned representations remain disconnected from the molecular processes they aim to capture. We present CDT-III, which extends mechanism-oriented AI across the full central dogma: DNA, RNA, and protein. Its two-stage Virtual Cell Embedder architecture mirrors the spatial compartmentalization of the cell: VCE-N models transcription in the nucleus and VCE-C models translation in the cytosol. On five held-out genes, CDT-III achieves per-gene RNA r=0.843 and protein r=0.969. Adding protein prediction improves RNA performance (r=0.804 to 0.843), demonstrating that downstream tasks regularize upstream representations. Protein supervision sharpens DNA-level interpretability, increasing CTCF enrichment by 30%. Applied to in silico CD52 knockdown approximating Alemtuzumab, the model predicts 29/29 protein changes correctly and rediscovers 5 of 7 known clinical side effects without clinical data. Gradient-based side effect profiling requires only unperturbed baseline data (r=0.939), enabling screening of all 2,361 genes without new experiments. Comments: Sub...