[2511.16652] Evolution Strategies at the Hyperscale

[2511.16652] Evolution Strategies at the Hyperscale

arXiv - AI 4 min read Article

Summary

The paper presents EGGROLL, an enhanced Evolution Strategy for optimizing large-scale models, achieving significant speed improvements and theoretical insights into convergence in high dimensions.

Why It Matters

As machine learning models grow in complexity and size, efficient optimization methods become crucial. EGGROLL addresses the limitations of traditional Evolution Strategies, offering a scalable solution that maintains performance while significantly increasing training speed, which is vital for advancing AI capabilities.

Key Takeaways

  • EGGROLL improves the efficiency of Evolution Strategies for large models.
  • Achieves up to 91% throughput of pure batch inference with structured perturbations.
  • Provides theoretical insights into the convergence of Evolution Strategies in high dimensions.
  • Demonstrates competitive performance on reasoning tasks with nonlinear recurrent models.
  • Maintains performance in reinforcement learning settings despite faster training.

Computer Science > Machine Learning arXiv:2511.16652 (cs) [Submitted on 20 Nov 2025 (v1), last revised 16 Feb 2026 (this version, v2)] Title:Evolution Strategies at the Hyperscale Authors:Bidipta Sarkar, Mattie Fellows, Juan Agustin Duque, Alistair Letcher, Antonio León Villares, Anya Sims, Clarisse Wibault, Dmitry Samsonov, Dylan Cope, Jarek Liesen, Kang Li, Lukas Seier, Theo Wolf, Uljad Berdica, Valentin Mohl, Alexander David Goldie, Aaron Courville, Karin Sevegnani, Shimon Whiteson, Jakob Nicolaus Foerster View a PDF of the paper titled Evolution Strategies at the Hyperscale, by Bidipta Sarkar and 19 other authors View PDF Abstract:Evolution Strategies (ES) is a class of powerful black-box optimisation methods that are highly parallelisable and can handle non-differentiable and noisy objectives. However, naïve ES becomes prohibitively expensive at scale on GPUs due to the low arithmetic intensity of batched matrix multiplications with unstructured random perturbations. We introduce Evolution Guided GeneRal Optimisation via Low-rank Learning (EGGROLL), which improves arithmetic intensity by structuring individual perturbations as rank-$r$ matrices, resulting in a hundredfold increase in training speed for billion-parameter models at large population sizes, achieving up to 91% of the throughput of pure batch inference. We provide a rigorous theoretical analysis of Gaussian ES for high-dimensional parameter objectives, investigating conditions needed for ES updates to conv...

Related Articles

Llms

Seeking Critique on Research Approach to Open Set Recognition (Novelty Detection) [R]

Hey guys, I'm an independent researcher working on a project that tries to address a very specific failure mode in LLMs and embedding bas...

Reddit - Machine Learning · 1 min ·
Machine Learning

What if attention didn’t need matrix multiplication?

I built a cognitive architecture where all computation reduces to three bit operations: XOR, MAJ, POPCNT. No GEMM. No GPU. No floating-po...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

WTF. Its real. AllBirds (the shoe company) is pivoting to inference.

I'm profoundly ambivalent re: how to feel about this; is it great -- what a scrappy, bold pivot! Or wildly dumb - its so far from their c...

Reddit - Artificial Intelligence · 1 min ·
Allbirds Is Pivoting to AI Compute. Sure, Why Not | WIRED
Ai Infrastructure

Allbirds Is Pivoting to AI Compute. Sure, Why Not | WIRED

Once a $4 billion apparel juggernaut, Allbirds will rebrand as NewBird AI, a “GPU-as-a-Service” company. Hey, if you can't beat ’em, join...

Wired - AI · 5 min ·
More in Ai Infrastructure: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime