[2602.14687] SynthSAEBench: Evaluating Sparse Autoencoders on Scalable Realistic Synthetic Data
Summary
The paper introduces SynthSAEBench, a toolkit for evaluating Sparse Autoencoders (SAEs) using large-scale synthetic data, addressing limitations of current benchmarks.
Why It Matters
SynthSAEBench provides a standardized method for assessing SAE architectures, enabling researchers to better diagnose failures and validate improvements. This is crucial for advancing machine learning techniques, particularly in the context of large language models (LLMs).
Key Takeaways
- SynthSAEBench generates large-scale synthetic data with realistic features.
- It allows for direct comparisons of SAE architectures through a standardized model.
- The benchmark reveals new failure modes in SAEs, highlighting the risk of overfitting.
- It complements existing LLM benchmarks by providing ground-truth features.
- Researchers can use SynthSAEBench to improve SAE validation before scaling.
Computer Science > Machine Learning arXiv:2602.14687 (cs) [Submitted on 16 Feb 2026] Title:SynthSAEBench: Evaluating Sparse Autoencoders on Scalable Realistic Synthetic Data Authors:David Chanin, Adrià Garriga-Alonso View a PDF of the paper titled SynthSAEBench: Evaluating Sparse Autoencoders on Scalable Realistic Synthetic Data, by David Chanin and 1 other authors View PDF HTML (experimental) Abstract:Improving Sparse Autoencoders (SAEs) requires benchmarks that can precisely validate architectural innovations. However, current SAE benchmarks on LLMs are often too noisy to differentiate architectural improvements, and current synthetic data experiments are too small-scale and unrealistic to provide meaningful comparisons. We introduce SynthSAEBench, a toolkit for generating large-scale synthetic data with realistic feature characteristics including correlation, hierarchy, and superposition, and a standardized benchmark model, SynthSAEBench-16k, enabling direct comparison of SAE architectures. Our benchmark reproduces several previously observed LLM SAE phenomena, including the disconnect between reconstruction and latent quality metrics, poor SAE probing results, and a precision-recall trade-off mediated by L0. We further use our benchmark to identify a new failure mode: Matching Pursuit SAEs exploit superposition noise to improve reconstruction without learning ground-truth features, suggesting that more expressive encoders can easily overfit. SynthSAEBench complements L...