[2602.19845] I Dropped a Neural Net
Summary
The paper 'I Dropped a Neural Net' explores a unique challenge in machine learning, where a neural network's layers are shuffled and need to be reordered. The authors propose a method to recover the correct order using stability conditions during training.
Why It Matters
Understanding how to effectively reorder neural network layers has significant implications for improving model performance and training efficiency. This research addresses a complex problem in machine learning, contributing to the broader field of neural network optimization.
Key Takeaways
- The paper presents a method for reordering shuffled neural network layers.
- Stability conditions during training can signal correct layer pairing.
- The proposed approach utilizes diagonal dominance ratios for effective pairing.
- The search space for layer ordering is astronomically large, highlighting the complexity of the problem.
- Initial seeding with proxies can enhance the optimization process.
Computer Science > Machine Learning arXiv:2602.19845 (cs) [Submitted on 23 Feb 2026] Title:I Dropped a Neural Net Authors:Hyunwoo Park View a PDF of the paper titled I Dropped a Neural Net, by Hyunwoo Park View PDF HTML (experimental) Abstract:A recent Dwarkesh Patel podcast with John Collison and Elon Musk featured an interesting puzzle from Jane Street: they trained a neural net, shuffled all 96 layers, and asked to put them back in order. Given unlabelled layers of a Residual Network and its training dataset, we recover the exact ordering of the layers. The problem decomposes into pairing each block's input and output projections ($48!$ possibilities) and ordering the reassembled blocks ($48!$ possibilities), for a combined search space of $(48!)^2 \approx 10^{122}$, which is more than the atoms in the observable universe. We show that stability conditions during training like dynamic isometry leave the product $W_{\text{out}} W_{\text{in}}$ for correctly paired layers with a negative diagonal structure, allowing us to use diagonal dominance ratio as a signal for pairing. For ordering, we seed-initialize with a rough proxy such as delta-norm or $\|W_{\text{out}}\|_F$ then hill-climb to zero mean squared error. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2602.19845 [cs.LG] (or arXiv:2602.19845v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.19845 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Hy...