Machine Learning Ai Infrastructure Data Science

[2602.19845] I Dropped a Neural Net

arXiv - Machine Learning February 24, 2026 3 min read Article

Summary

The paper 'I Dropped a Neural Net' explores a unique challenge in machine learning, where a neural network's layers are shuffled and need to be reordered. The authors propose a method to recover the correct order using stability conditions during training.

Why It Matters

Understanding how to effectively reorder neural network layers has significant implications for improving model performance and training efficiency. This research addresses a complex problem in machine learning, contributing to the broader field of neural network optimization.

Key Takeaways

The paper presents a method for reordering shuffled neural network layers.
Stability conditions during training can signal correct layer pairing.
The proposed approach utilizes diagonal dominance ratios for effective pairing.
The search space for layer ordering is astronomically large, highlighting the complexity of the problem.
Initial seeding with proxies can enhance the optimization process.

Computer Science > Machine Learning arXiv:2602.19845 (cs) [Submitted on 23 Feb 2026] Title:I Dropped a Neural Net Authors:Hyunwoo Park View a PDF of the paper titled I Dropped a Neural Net, by Hyunwoo Park View PDF HTML (experimental) Abstract:A recent Dwarkesh Patel podcast with John Collison and Elon Musk featured an interesting puzzle from Jane Street: they trained a neural net, shuffled all 96 layers, and asked to put them back in order. Given unlabelled layers of a Residual Network and its training dataset, we recover the exact ordering of the layers. The problem decomposes into pairing each block's input and output projections ($48!$ possibilities) and ordering the reassembled blocks ($48!$ possibilities), for a combined search space of $(48!)^2 \approx 10^{122}$, which is more than the atoms in the observable universe. We show that stability conditions during training like dynamic isometry leave the product $W_{\text{out}} W_{\text{in}}$ for correctly paired layers with a negative diagonal structure, allowing us to use diagonal dominance ratio as a signal for pairing. For ordering, we seed-initialize with a rough proxy such as delta-norm or $\|W_{\text{out}}\|_F$ then hill-climb to zero mean squared error. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2602.19845 [cs.LG] (or arXiv:2602.19845v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.19845 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Hy...

Read Original Article

Machine Learning

[R] Architecture Determines Optimization: Deriving Weight Updates from Network Topology (seeking arXiv endorsement - cs.LG)

Abstract: We derive neural network weight updates from first principles without assuming gradient descent or a specific loss function. St...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

[P] ML project (XGBoost + Databricks + MLflow) — how to talk about “production issues” in interviews?

Hey all, I recently built an end-to-end fraud detection project using a large banking dataset: Trained an XGBoost model Used Databricks f...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

[D] The memory chip market lost tens of billions over a paper this community would have understood in 10 minutes

TurboQuant was teased recently and tens of billions gone from memory chip market in 48 hours but anyone in this community who read the pa...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

Copilot is ‘for entertainment purposes only,’ according to Microsoft’s terms of use | TechCrunch

AI skeptics aren’t the only ones warning users not to unthinkingly trust models’ outputs — that’s what the AI companies say themselves in...

TechCrunch - AI · 3 min · about 3 hours ago

[2602.19845] I Dropped a Neural Net

Summary

Why It Matters

Key Takeaways

Related Articles

[R] Architecture Determines Optimization: Deriving Weight Updates from Network Topology (seeking arXiv endorsement - cs.LG)

[P] ML project (XGBoost + Databricks + MLflow) — how to talk about “production issues” in interviews?

[D] The memory chip market lost tens of billions over a paper this community would have understood in 10 minutes

Copilot is ‘for entertainment purposes only,’ according to Microsoft’s terms of use | TechCrunch

No comments

Stay updated with AI News