[2505.13742] Understanding Task Representations in Neural Networks via Bayesian Ablation
About this article
Abstract page for arXiv paper 2505.13742: Understanding Task Representations in Neural Networks via Bayesian Ablation
Computer Science > Machine Learning arXiv:2505.13742 (cs) [Submitted on 19 May 2025 (v1), last revised 4 Apr 2026 (this version, v2)] Title:Understanding Task Representations in Neural Networks via Bayesian Ablation Authors:Andrew Nam, Declan Campbell, Thomas Griffiths, Jonathan Cohen, Sarah-Jane Leslie View a PDF of the paper titled Understanding Task Representations in Neural Networks via Bayesian Ablation, by Andrew Nam and 4 other authors View PDF HTML (experimental) Abstract:Neural networks are powerful tools for cognitive modeling due to their flexibility and emergent properties. However, interpreting their learned representations remains challenging due to their sub-symbolic semantics. In this work, we introduce a novel probabilistic framework for interpreting latent task representations in neural networks. Inspired by Bayesian inference, our approach defines a distribution over representational units to infer their causal contributions to task performance. Using ideas from information theory, we propose a suite of tools and metrics to illuminate key model properties, including representational distributedness, manifold complexity, and polysemanticity. Comments: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2505.13742 [cs.LG] (or arXiv:2505.13742v2 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2505.13742 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Andrew Nam [view email] [v1] Mon, 19 ...