[2603.25857] In-Context Molecular Property Prediction with LLMs: A Blinding Study on Memorization and Knowledge Conflicts
About this article
Abstract page for arXiv paper 2603.25857: In-Context Molecular Property Prediction with LLMs: A Blinding Study on Memorization and Knowledge Conflicts
Computer Science > Machine Learning arXiv:2603.25857 (cs) [Submitted on 26 Mar 2026] Title:In-Context Molecular Property Prediction with LLMs: A Blinding Study on Memorization and Knowledge Conflicts Authors:Matthias Busch, Marius Tacke, Sviatlana V. Lamaka, Mikhail L. Zheludkevich, Christian J. Cyron, Christian Feiler, Roland C. Aydin View a PDF of the paper titled In-Context Molecular Property Prediction with LLMs: A Blinding Study on Memorization and Knowledge Conflicts, by Matthias Busch and 6 other authors View PDF HTML (experimental) Abstract:The capabilities of large language models (LLMs) have expanded beyond natural language processing to scientific prediction tasks, including molecular property prediction. However, their effectiveness in in-context learning remains ambiguous, particularly given the potential for training data contamination in widely used benchmarks. This paper investigates whether LLMs perform genuine in-context regression on molecular properties or rely primarily on memorized values. Furthermore, we analyze the interplay between pre-trained knowledge and in-context information through a series of progressively blinded experiments. We evaluate nine LLM variants across three families (GPT-4.1, GPT-5, Gemini 2.5) on three MoleculeNet datasets (Delaney solubility, Lipophilicity, QM7 atomization energy) using a systematic blinding approach that iteratively reduces available information. Complementing this, we utilize varying in-context sample sizes (...