[2603.26994] ImmSET: Sequence-Based Predictor of TCR-pMHC Specificity at Scale
About this article
Abstract page for arXiv paper 2603.26994: ImmSET: Sequence-Based Predictor of TCR-pMHC Specificity at Scale
Computer Science > Machine Learning arXiv:2603.26994 (cs) [Submitted on 27 Mar 2026] Title:ImmSET: Sequence-Based Predictor of TCR-pMHC Specificity at Scale Authors:Marco Garcia Noceda, Matthew T Noakes, Andrew FigPope, Daniel E Mattox, Bryan Howie, Harlan Robins View a PDF of the paper titled ImmSET: Sequence-Based Predictor of TCR-pMHC Specificity at Scale, by Marco Garcia Noceda and 5 other authors View PDF HTML (experimental) Abstract:T cells are a critical component of the adaptive immune system, playing a role in infectious disease, autoimmunity, and cancer. T cell function is mediated by the T cell receptor (TCR) protein, a highly diverse receptor targeting specific peptides presented by the major histocompatibility complex (pMHCs). Predicting the specificity of TCRs for their cognate pMHCs is central to understanding adaptive immunity and enabling personalized therapies. However, accurate prediction of this protein-protein interaction remains challenging due to the extreme diversity of both TCRs and pMHCs. Here, we present ImmSET (Immune Synapse Encoding Transformer), a novel sequence-based architecture designed to model interactions among sets of variable-length biological sequences. We train this model across a range of dataset sizes and compositions and study the resulting models' generalization to pMHC targets. We describe a failure mode in prior sequence-based approaches that inflates previously reported performance on this task and show that ImmSET remains ro...