[2602.00924] Supervised sparse auto-encoders for interpretable and compositional representations

[2602.00924] Supervised sparse auto-encoders for interpretable and compositional representations

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2602.00924: Supervised sparse auto-encoders for interpretable and compositional representations

Computer Science > Artificial Intelligence arXiv:2602.00924 (cs) [Submitted on 31 Jan 2026 (v1), last revised 8 May 2026 (this version, v2)] Title:Supervised sparse auto-encoders for interpretable and compositional representations Authors:Ouns El Harzli, Hugo Wallner, Yoonsoo Nam, Haixuan Xavier Tao View a PDF of the paper titled Supervised sparse auto-encoders for interpretable and compositional representations, by Ouns El Harzli and 3 other authors View PDF Abstract:Sparse auto-encoders (SAEs) have re-emerged as a prominent method for mechanistic interpretability, yet they face two significant challenges: the non-smoothness of the $L_1$ penalty, which hinders reconstruction and scalability, and a lack of alignment between learned features and human semantics. In this paper, we address these limitations by adapting unconstrained feature models-a mathematical framework from neural collapse theory-and by supervising the task. We supervise (decoder-only) SAEs to reconstruct feature vectors by jointly learning sparse concept embeddings and decoder weights. Validated on Stable Diffusion 3.5, our approach demonstrates compositional generalization, successfully reconstructing images with concept combinations unseen during training, and enabling feature-level intervention for semantic image editing without prompt modification. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2602.00924 [cs.AI]   (or arXiv:2602.00924v2 [cs.AI] for this version)   https://doi.org/10.48550/a...

Originally published on May 11, 2026. Curated by AI News.

Related Articles

Machine Learning

What to expect from AlphaZero's value predictions [D]

An AlphaZero agent has learnt to predict the value of a game state by training on data generated by self-play by the model and a series o...

Reddit - Machine Learning · 1 min ·
Machine Learning

Open Source Projects related to CNNs to Contribute To? [D]

Around a decade a go I was tinkering a lot with CNNs for real time event detection. I enjoyed that a lot and always wanted to get back in...

Reddit - Machine Learning · 1 min ·
I Work in Hollywood. Everyone Who Used to Make TV Is Now Secretly Training AI | WIRED
Machine Learning

I Work in Hollywood. Everyone Who Used to Make TV Is Now Secretly Training AI | WIRED

For screenwriters like me—and job seekers all over—AI gig work is the new waiting tables. In eight months, I’ve done 20 of these soul-cru...

Wired - AI · 27 min ·
Machine Learning

Are Enterprises Using AI in the Wrong Places?

Most enterprise AI discussions still revolve around one question: But I’m starting to think that may be the wrong question entirely. The ...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime