[2603.01834] Probing Materials Knowledge in LLMs: From Latent

[2603.01834] Probing Materials Knowledge in LLMs: From Latent Embeddings to Reliable Predictions

arXiv - Machine Learning March 03, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.01834: Probing Materials Knowledge in LLMs: From Latent Embeddings to Reliable Predictions

Condensed Matter > Materials Science arXiv:2603.01834 (cond-mat) [Submitted on 2 Mar 2026] Title:Probing Materials Knowledge in LLMs: From Latent Embeddings to Reliable Predictions Authors:Vineeth Venugopal, Soroush Mahjoubi, Elsa Olivetti View a PDF of the paper titled Probing Materials Knowledge in LLMs: From Latent Embeddings to Reliable Predictions, by Vineeth Venugopal and 2 other authors View PDF HTML (experimental) Abstract:Large language models are increasingly applied to materials science, yet fundamental questions remain about their reliability and knowledge encoding. Evaluating 25 LLMs across four materials science tasks -- over 200 base and fine-tuned configurations -- we find that output modality fundamentally determines model behavior. For symbolic tasks, fine-tuning converges to consistent, verifiable answers with reduced response entropy, while for numerical tasks, fine-tuning improves prediction accuracy but models remain inconsistent across repeated inference runs, limiting their reliability as quantitative predictors. For numerical regression, we find that better performance can be obtained by extracting embeddings directly from intermediate transformer layers than from model text output, revealing an ``LLM head bottleneck,'' though this effect is property- and dataset-dependent. Finally, we present a longitudinal study of GPT model performance in materials science, tracking four models over 18 months and observing 9--43\% performance variation that pose...

Originally published on March 03, 2026. Curated by AI News.

Llms

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet

Anthropic is testing an unreleased artificial intelligence (AI) model with capabilities that exceed any system it has previously released...

AI Tools & Products · 5 min · 9 minutes ago

Llms

Anthropic leaks part of Claude Code's internal source code

Claude Code has seen massive adoption over the last year, and its run-rate revenue had swelled to more than $2.5 billion as of February.

AI Tools & Products · 3 min · 9 minutes ago

Llms

Australian government and Anthropic sign MOU for AI safety and research

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

AI Tools & Products · 5 min · 9 minutes ago

Llms

Penguin to sue OpenAI over ChatGPT version of German children’s book

Publisher alleges AI research company’s chatbot violated its copyright over Coconut the Little Dragon series

AI Tools & Products · 3 min · 9 minutes ago

[2603.01834] Probing Materials Knowledge in LLMs: From Latent Embeddings to Reliable Predictions

About this article

Related Articles

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet

Anthropic leaks part of Claude Code's internal source code

Australian government and Anthropic sign MOU for AI safety and research

Penguin to sue OpenAI over ChatGPT version of German children’s book

No comments

Stay updated with AI News