[2602.22059] NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training

[2602.22059] NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training

arXiv - AI 3 min read Article

Summary

The paper introduces NESTOR, a nested Mixture-of-Experts (MoE) neural operator designed for efficient large-scale pre-training of PDEs, enhancing computational efficiency and model transferability.

Why It Matters

This research addresses the limitations of traditional numerical methods in solving partial differential equations (PDEs) by leveraging advanced neural network architectures. The proposed approach can significantly improve the efficiency and effectiveness of PDE modeling, which is crucial for various scientific and engineering applications.

Key Takeaways

  • NESTOR utilizes a nested MoE framework to enhance neural operator capabilities.
  • The model effectively captures both global and local dependencies in PDEs.
  • Large-scale pre-training on diverse datasets demonstrates improved generalization.
  • The approach allows selective activation of expert networks for better performance.
  • Results indicate strong transferability to downstream tasks.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.22059 (cs) [Submitted on 25 Feb 2026] Title:NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training Authors:Dengdi Sun, Xiaoya Zhou, Xiao Wang, Hao Si, Wanli Lyu, Jin Tang, Bin Luo View a PDF of the paper titled NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training, by Dengdi Sun and 6 other authors View PDF HTML (experimental) Abstract:Neural operators have emerged as an efficient paradigm for solving PDEs, overcoming the limitations of traditional numerical methods and significantly improving computational efficiency. However, due to the diversity and complexity of PDE systems, existing neural operators typically rely on a single network architecture, which limits their capacity to fully capture heterogeneous features and complex system dependencies. This constraint poses a bottleneck for large-scale PDE pre-training based on neural operators. To address these challenges, we propose a large-scale PDE pre-trained neural operator based on a nested Mixture-of-Experts (MoE) framework. In particular, the image-level MoE is designed to capture global dependencies, while the token-level Sub-MoE focuses on local dependencies. Our model can selectively activate the most suitable expert networks for a given input, thereby enhancing generalization and transferability. We conduct large-scale pre-training on twelve PDE datasets from diverse sources and successfully transfer...

Related Articles

Machine Learning

[D] When to transition from simple heuristics to ML models (e.g., DensityFunction)?

Two questions: What are the recommendations around when to transition from a simple heuristic baseline to machine learning ML models for ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ICML 2026 Average Score

Hi all, I’m curious about the current review dynamics for ICML 2026, especially after the rebuttal phase. For those who are reviewers (or...

Reddit - Machine Learning · 1 min ·
Machine Learning

[R] VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)

We present VOID, a model for video object removal that aims to handle *physical interactions*, not just appearance. Most existing video i...

Reddit - Machine Learning · 1 min ·
Machine Learning

FLUX 2 Pro (2026) Sketch to Image

I sketched a cow and tested how different models interpret it into a realistic image for downstream 3D generation, turns out some models ...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime