[2603.04198] Stable and Steerable Sparse Autoencoders with Weight Regularization

[2603.04198] Stable and Steerable Sparse Autoencoders with Weight Regularization

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2603.04198: Stable and Steerable Sparse Autoencoders with Weight Regularization

Statistics > Machine Learning arXiv:2603.04198 (stat) [Submitted on 4 Mar 2026] Title:Stable and Steerable Sparse Autoencoders with Weight Regularization Authors:Piotr Jedryszek, Oliver M. Crook View a PDF of the paper titled Stable and Steerable Sparse Autoencoders with Weight Regularization, by Piotr Jedryszek and 1 other authors View PDF HTML (experimental) Abstract:Sparse autoencoders (SAEs) are widely used to extract human-interpretable features from neural network activations, but their learned features can vary substantially across random seeds and training choices. To improve stability, we studied weight regularization by adding L1 or L2 penalties on encoder and decoder weights, and evaluate how regularization interacts with common SAE training defaults. On MNIST, we observe that L2 weight regularization produces a core of highly aligned features and, when combined with tied initialization and unit-norm decoder constraints, it dramatically increases cross-seed feature consistency. For TopK SAEs trained on language model activations (Pythia-70M-deduped), adding a small L2 weight penalty increased the fraction of features shared across three random seeds and roughly doubles steering success rates, while leaving the mean of automated interpretability scores essentially unchanged. Finally, in the regularized setting, activation steering success becomes better predicted by auto-interpretability scores, suggesting that regularization can align text-based feature explanat...

Originally published on March 05, 2026. Curated by AI News.

Related Articles

Machine Learning

[D] MXFP8 GEMM: Up to 99% of cuBLAS performance using CUDA + PTX

New blog post by Daniel Vega-Myhre (Meta/PyTorch) illustrating GEMM design for FP8, including deep-dives into all the constraints and des...

Reddit - Machine Learning · 1 min ·
IIT Delhi launches 8th batch of Advanced AI, ML, and DL online programme: Check who is eligible, applicat
Machine Learning

IIT Delhi launches 8th batch of Advanced AI, ML, and DL online programme: Check who is eligible, applicat

News News: The Continuing Education Programme (CEP) at IIT Delhi has announced the launch of the 8th batch of its Advanced Certificate Pr...

AI News - General · 9 min ·
Chamco Digital Launches Microsoft AI and Cloud Technology Training Program with Board-Endorsed Strategic Expansion
Machine Learning

Chamco Digital Launches Microsoft AI and Cloud Technology Training Program with Board-Endorsed Strategic Expansion

Chamco Digital, a recognized Microsoft AI and Cloud Technology Partner, announced the launch of its globally accessible Microsoft AI and ...

AI News - General · 4 min ·
FPT Wins AI & Machine Learning Innovation Award at 2026 InsurInnovator Connect Asia Awards
Machine Learning

FPT Wins AI & Machine Learning Innovation Award at 2026 InsurInnovator Connect Asia Awards

HANOI, Vietnam--(BUSINESS WIRE)--Mar 30, 2026--

AI News - General · 13 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime