[2510.11390] Medical Interpretability and Knowledge Maps of Large Language Models
Summary
This article presents a systematic study of medical interpretability in Large Language Models (LLMs), exploring how these models process and represent medical knowledge through various interpretability techniques.
Why It Matters
Understanding how LLMs interpret and manage medical knowledge is crucial for enhancing their reliability in healthcare applications. This research provides insights that can inform the development of better fine-tuning, un-learning, and de-biasing strategies for LLMs, ultimately improving their effectiveness in medical tasks.
Key Takeaways
- The study employs four interpretability techniques to analyze LLMs in the medical domain.
- Key findings indicate that medical knowledge is primarily processed in the initial layers of the model.
- Non-linear encoding of patient age and non-monotonic disease progression representations were observed.
- Drug knowledge clusters better by medical specialty rather than mechanism of action in certain models.
- These insights can guide future research on improving LLMs for medical applications.
Computer Science > Machine Learning arXiv:2510.11390 (cs) [Submitted on 13 Oct 2025 (v1), last revised 21 Feb 2026 (this version, v2)] Title:Medical Interpretability and Knowledge Maps of Large Language Models Authors:Razvan Marinescu, Victoria-Elisabeth Gruber, Diego Fajardo View a PDF of the paper titled Medical Interpretability and Knowledge Maps of Large Language Models, by Razvan Marinescu and 1 other authors View PDF HTML (experimental) Abstract:We present a systematic study of medical-domain interpretability in Large Language Models (LLMs). We study how the LLMs both represent and process medical knowledge through four different interpretability techniques: (1) UMAP projections of intermediate activations, (2) gradient-based saliency with respect to the model weights, (3) layer lesioning/removal and (4) activation patching. We present knowledge maps of five LLMs which show, at a coarse-resolution, where knowledge about patient's ages, medical symptoms, diseases and drugs is stored in the models. In particular for Llama3.3-70B, we find that most medical knowledge is processed in the first half of the model's layers. In addition, we find several interesting phenomena: (i) age is often encoded in a non-linear and sometimes discontinuous manner at intermediate layers in the models, (ii) the disease progression representation is non-monotonic and circular at certain layers of the model, (iii) in Llama3.3-70B, drugs cluster better by medical specialty rather than mechanis...