[2603.04198] Stable and Steerable Sparse Autoencoders with Weight

[2603.04198] Stable and Steerable Sparse Autoencoders with Weight Regularization

arXiv - Machine Learning March 05, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.04198: Stable and Steerable Sparse Autoencoders with Weight Regularization

Statistics > Machine Learning arXiv:2603.04198 (stat) [Submitted on 4 Mar 2026] Title:Stable and Steerable Sparse Autoencoders with Weight Regularization Authors:Piotr Jedryszek, Oliver M. Crook View a PDF of the paper titled Stable and Steerable Sparse Autoencoders with Weight Regularization, by Piotr Jedryszek and 1 other authors View PDF HTML (experimental) Abstract:Sparse autoencoders (SAEs) are widely used to extract human-interpretable features from neural network activations, but their learned features can vary substantially across random seeds and training choices. To improve stability, we studied weight regularization by adding L1 or L2 penalties on encoder and decoder weights, and evaluate how regularization interacts with common SAE training defaults. On MNIST, we observe that L2 weight regularization produces a core of highly aligned features and, when combined with tied initialization and unit-norm decoder constraints, it dramatically increases cross-seed feature consistency. For TopK SAEs trained on language model activations (Pythia-70M-deduped), adding a small L2 weight penalty increased the fraction of features shared across three random seeds and roughly doubles steering success rates, while leaving the mean of automated interpretability scores essentially unchanged. Finally, in the regularized setting, activation steering success becomes better predicted by auto-interpretability scores, suggesting that regularization can align text-based feature explanat...

Originally published on March 05, 2026. Curated by AI News.

Machine Learning

[D] MXFP8 GEMM: Up to 99% of cuBLAS performance using CUDA + PTX

New blog post by Daniel Vega-Myhre (Meta/PyTorch) illustrating GEMM design for FP8, including deep-dives into all the constraints and des...

Reddit - Machine Learning · 1 min · 35 minutes ago

Machine Learning

IIT Delhi launches 8th batch of Advanced AI, ML, and DL online programme: Check who is eligible, applicat

News News: The Continuing Education Programme (CEP) at IIT Delhi has announced the launch of the 8th batch of its Advanced Certificate Pr...

AI News - General · 9 min · about 1 hour ago

Machine Learning

Chamco Digital Launches Microsoft AI and Cloud Technology Training Program with Board-Endorsed Strategic Expansion

Chamco Digital, a recognized Microsoft AI and Cloud Technology Partner, announced the launch of its globally accessible Microsoft AI and ...

AI News - General · 4 min · about 1 hour ago

Machine Learning

FPT Wins AI & Machine Learning Innovation Award at 2026 InsurInnovator Connect Asia Awards

HANOI, Vietnam--(BUSINESS WIRE)--Mar 30, 2026--

AI News - General · 13 min · about 1 hour ago

[2603.04198] Stable and Steerable Sparse Autoencoders with Weight Regularization

About this article

Related Articles

[D] MXFP8 GEMM: Up to 99% of cuBLAS performance using CUDA + PTX

IIT Delhi launches 8th batch of Advanced AI, ML, and DL online programme: Check who is eligible, applicat

Chamco Digital Launches Microsoft AI and Cloud Technology Training Program with Board-Endorsed Strategic Expansion

FPT Wins AI & Machine Learning Innovation Award at 2026 InsurInnovator Connect Asia Awards

No comments

Stay updated with AI News