[2602.02201] Cardinality-Preserving Attention Channels for Graph Transformers in Molecular Property Prediction
Summary
This article presents a novel graph transformer model, incorporating cardinality-preserving attention channels, to enhance molecular property prediction, crucial for drug discovery.
Why It Matters
Molecular property prediction is essential in drug discovery, especially when labeled data is limited. This research introduces a new model that improves prediction accuracy, potentially accelerating the development of new drugs and therapies.
Key Takeaways
- Introduces a graph transformer model with cardinality-preserving attention channels.
- Demonstrates improvements in molecular property prediction across 11 benchmarks.
- Combines structured sparse attention with self-supervised pretraining techniques.
- Provides rigorous evaluations to confirm the model's effectiveness.
- Includes code and reproducibility artifacts for further research.
Computer Science > Machine Learning arXiv:2602.02201 (cs) [Submitted on 2 Feb 2026 (v1), last revised 14 Feb 2026 (this version, v4)] Title:Cardinality-Preserving Attention Channels for Graph Transformers in Molecular Property Prediction Authors:Abhijit Gupta View a PDF of the paper titled Cardinality-Preserving Attention Channels for Graph Transformers in Molecular Property Prediction, by Abhijit Gupta View PDF HTML (experimental) Abstract:Molecular property prediction is crucial for drug discovery when labeled data are scarce. This work presents \modelname, a graph transformer augmented with a query-conditioned cardinality-preserving attention (CPA) channel that retains dynamic support-size signals complementary to static centrality embeddings. The approach combines structured sparse attention with Graphormer-inspired biases (shortest-path distance, centrality, direct-bond features) and unified dual-objective self-supervised pretraining (masked reconstruction and contrastive alignment of augmented views). Evaluation on 11 public benchmarks spanning MoleculeNet, OGB, and TDC ADMET demonstrates consistent improvements over protocol-matched baselines under matched pretraining, optimization, and hyperparameter tuning. Rigorous ablations confirm CPA's contributions and rule out simple size shortcuts. Code and reproducibility artifacts are provided. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2602.02201 [cs.LG] (or arXiv:2602.02201v4 [c...