[2602.00924] Supervised sparse auto-encoders for interpretable and

[2602.00924] Supervised sparse auto-encoders for interpretable and compositional representations

arXiv - AI May 11, 2026 3 min read

About this article

Abstract page for arXiv paper 2602.00924: Supervised sparse auto-encoders for interpretable and compositional representations

Computer Science > Artificial Intelligence arXiv:2602.00924 (cs) [Submitted on 31 Jan 2026 (v1), last revised 8 May 2026 (this version, v2)] Title:Supervised sparse auto-encoders for interpretable and compositional representations Authors:Ouns El Harzli, Hugo Wallner, Yoonsoo Nam, Haixuan Xavier Tao View a PDF of the paper titled Supervised sparse auto-encoders for interpretable and compositional representations, by Ouns El Harzli and 3 other authors View PDF Abstract:Sparse auto-encoders (SAEs) have re-emerged as a prominent method for mechanistic interpretability, yet they face two significant challenges: the non-smoothness of the $L_1$ penalty, which hinders reconstruction and scalability, and a lack of alignment between learned features and human semantics. In this paper, we address these limitations by adapting unconstrained feature models-a mathematical framework from neural collapse theory-and by supervising the task. We supervise (decoder-only) SAEs to reconstruct feature vectors by jointly learning sparse concept embeddings and decoder weights. Validated on Stable Diffusion 3.5, our approach demonstrates compositional generalization, successfully reconstructing images with concept combinations unseen during training, and enabling feature-level intervention for semantic image editing without prompt modification. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2602.00924 [cs.AI] (or arXiv:2602.00924v2 [cs.AI] for this version) https://doi.org/10.48550/a...

Originally published on May 11, 2026. Curated by AI News.

Machine Learning

What to expect from AlphaZero's value predictions [D]

An AlphaZero agent has learnt to predict the value of a game state by training on data generated by self-play by the model and a series o...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

Open Source Projects related to CNNs to Contribute To? [D]

Around a decade a go I was tinkering a lot with CNNs for real time event detection. I enjoyed that a lot and always wanted to get back in...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

I Work in Hollywood. Everyone Who Used to Make TV Is Now Secretly Training AI | WIRED

For screenwriters like me—and job seekers all over—AI gig work is the new waiting tables. In eight months, I’ve done 20 of these soul-cru...

Wired - AI · 27 min · about 3 hours ago

Machine Learning

Are Enterprises Using AI in the Wrong Places?

Most enterprise AI discussions still revolve around one question: But I’m starting to think that may be the wrong question entirely. The ...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

[2602.00924] Supervised sparse auto-encoders for interpretable and compositional representations

About this article

Related Articles

What to expect from AlphaZero's value predictions [D]

Open Source Projects related to CNNs to Contribute To? [D]

I Work in Hollywood. Everyone Who Used to Make TV Is Now Secretly Training AI | WIRED

Are Enterprises Using AI in the Wrong Places?

No comments

Stay updated with AI News