[2409.15318] On the Complexity of Neural Computation in Superposition

[2409.15318] On the Complexity of Neural Computation in Superposition

arXiv - AI 4 min read Article

Summary

This paper explores the complexity of neural computation in superposition, establishing theoretical bounds for algorithms and revealing limits on model sparsification while maintaining expressibility.

Why It Matters

Understanding the complexity of neural computation in superposition is crucial for optimizing neural network architectures. This research provides foundational insights that can influence the design of more efficient models, impacting various applications in AI and machine learning.

Key Takeaways

  • Establishes lower bounds for neural networks computing in superposition.
  • Demonstrates the exponential gap between computing features and merely representing them.
  • Provides upper bounds for specific logical operations using fewer neurons and parameters.

Computer Science > Computational Complexity arXiv:2409.15318 (cs) [Submitted on 5 Sep 2024 (v1), last revised 26 Feb 2026 (this version, v3)] Title:On the Complexity of Neural Computation in Superposition Authors:Micah Adler, Nir Shavit View a PDF of the paper titled On the Complexity of Neural Computation in Superposition, by Micah Adler and Nir Shavit View PDF HTML (experimental) Abstract:Superposition, the ability of neural networks to represent more features than neurons, is increasingly seen as key to the efficiency of large models. This paper investigates the theoretical foundations of computing in superposition, establishing complexity bounds for explicit, provably correct algorithms. We present the first lower bounds for a neural network computing in superposition, showing that for a broad class of problems, including permutations and pairwise logical operations, computing $m'$ features in superposition requires at least $\Omega(\sqrt{m' \log m'})$ neurons and $\Omega(m' \log m')$ parameters. This implies an explicit limit on how much one can sparsify or distill a model while preserving its expressibility, and complements empirical scaling laws by implying the first subexponential bound on capacity: a network with $n$ neurons can compute at most $O(n^2 / \log n)$ features. Conversely, we provide a nearly tight constructive upper bound: logical operations like pairwise AND can be computed using $O(\sqrt{m'} \log m')$ neurons and $O(m' \log^2 m')$ parameters. There i...

Related Articles

Llms

World models will be the next big thing, bye-bye LLMs

Was at Nvidia's GTC conference recently and honestly, it was one of the most eye-opening events I've attended in a while. There was a lot...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] Got my first offer after months of searching — below posted range, contract-to-hire, and worried it may pause my search. Do I take it?

I could really use some outside perspective. I’m a senior ML/CV engineer in Canada with about 5–6 years across research and industry. Mas...

Reddit - Machine Learning · 1 min ·
Machine Learning

[Research] AI training is bad, so I started an research

Hello, I started researching about AI training Q:Why? R: Because AI training is bad right now. Q: What do you mean its bad? R: Like when ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

We built an open-source prototype that applies Unix philosophy to retrieval pipelines. Each stage (PII redaction, chunking, dedup, embedd...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime