[2602.19244] Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts
Summary
This paper presents a Soft Mixture-of-Experts framework for Directed Controller Synthesis, enhancing exploration policies in reinforcement learning to improve robustness and generalization across parameter spaces.
Why It Matters
The research addresses a critical limitation in reinforcement learning related to anisotropic generalization, which can hinder the effectiveness of AI systems in complex environments. By proposing a framework that combines multiple experts, it aims to enhance the robustness and applicability of AI in real-world scenarios, such as air traffic control.
Key Takeaways
- Introduces a Soft Mixture-of-Experts framework to improve exploration in directed controller synthesis.
- Addresses the challenge of anisotropic generalization in reinforcement learning.
- Demonstrates improved robustness and expanded solvable parameter space in evaluations.
- Highlights the importance of exploration policies in AI system performance.
- Provides insights applicable to complex domains like air traffic management.
Computer Science > Artificial Intelligence arXiv:2602.19244 (cs) [Submitted on 22 Feb 2026] Title:Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts Authors:Toshihide Ubukata, Zhiyao Wang, Enhong Mu, Jialong Li, Kenji Tei View a PDF of the paper titled Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts, by Toshihide Ubukata and 3 other authors View PDF HTML (experimental) Abstract:On-the-fly Directed Controller Synthesis (OTF-DCS) mitigates state-space explosion by incrementally exploring the system and relies critically on an exploration policy to guide search efficiently. Recent reinforcement learning (RL) approaches learn such policies and achieve promising zero-shot generalization from small training instances to larger unseen ones. However, a fundamental limitation is anisotropic generalization, where an RL policy exhibits strong performance only in a specific region of the domain-parameter space while remaining fragile elsewhere due to training stochasticity and trajectory-dependent bias. To address this, we propose a Soft Mixture-of-Experts framework that combines multiple RL experts via a prior-confidence gating mechanism and treats these anisotropic behaviors as complementary specializations. The evaluation on the Air Traffic benchmark shows that Soft-MoE substantially expands the solvable parameter space and improves robustness compared to any single ...