[2602.21798] Excitation: Momentum For Experts

[2602.21798] Excitation: Momentum For Experts

arXiv - Machine Learning 3 min read Article

Summary

The paper introduces Excitation, a novel optimization framework aimed at enhancing learning in sparse architectures like Mixture-of-Experts (MoEs) by dynamically modulating updates based on expert utilization.

Why It Matters

Excitation addresses the limitations of traditional optimizers in deep learning models, particularly in MoEs, where it can improve convergence speed and model performance. This framework is significant for researchers and practitioners looking to optimize sparse architectures effectively, especially in resource-constrained environments.

Key Takeaways

  • Excitation enhances learning in sparse architectures by modulating updates based on expert utilization.
  • It resolves the 'structural confusion' issue in MoEs, allowing for stable training.
  • The framework is optimizer-, domain-, and model-agnostic, requiring minimal integration effort.
  • Excitation improves convergence speed and final performance across various tasks.
  • It introduces no additional per-parameter states, making it suitable for memory-constrained settings.

Computer Science > Machine Learning arXiv:2602.21798 (cs) [Submitted on 25 Feb 2026] Title:Excitation: Momentum For Experts Authors:Sagi Shaier View a PDF of the paper titled Excitation: Momentum For Experts, by Sagi Shaier View PDF HTML (experimental) Abstract:We propose Excitation, a novel optimization framework designed to accelerate learning in sparse architectures such as Mixture-of-Experts (MoEs). Unlike traditional optimizers that treat all parameters uniformly, Excitation dynamically modulates updates using batch-level expert utilization. It introduces a competitive update dynamic that amplifies updates to highly-utilized experts and can selectively suppress low-utilization ones, effectively sharpening routing specialization. Notably, we identify a phenomenon of "structural confusion" in deep MoEs, where standard optimizers fail to establish functional signal paths; Excitation acts as a specialization catalyst, "rescuing" these models and enabling stable training where baselines remain trapped. Excitation is optimizer-, domain-, and model-agnostic, requires minimal integration effort, and introduces neither additional per-parameter optimizer state nor learnable parameters, making it highly viable for memory-constrained settings. Across language and vision tasks, Excitation consistently improves convergence speed and final performance in MoE models, indicating that active update modulation is a key mechanism for effective conditional computation. Subjects: Machine L...

Related Articles

Machine Learning

[R] VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)

We present VOID, a model for video object removal that aims to handle *physical interactions*, not just appearance. Most existing video i...

Reddit - Machine Learning · 1 min ·
Machine Learning

FLUX 2 Pro (2026) Sketch to Image

I sketched a cow and tested how different models interpret it into a realistic image for downstream 3D generation, turns out some models ...

Reddit - Artificial Intelligence · 1 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
Machine Learning

[D] TMLR reviews seem more reliable than ICML/NeurIPS/ICLR

This year I submitted a paper to ICML for the first time. I have also experienced the review process at TMLR and ICLR. From my observatio...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime