[2602.22685] Switch-Hurdle: A MoE Encoder with AR Hurdle Decoder for Intermittent Demand Forecasting
Summary
The paper introduces Switch-Hurdle, a novel framework combining a Mixture-of-Experts encoder and an autoregressive hurdle decoder to improve forecasting for intermittent demand in retail and supply chains.
Why It Matters
Intermittent demand forecasting is crucial for efficient inventory management in retail and supply chains. Traditional methods often fail to accurately predict such demand patterns. Switch-Hurdle addresses these challenges, offering a scalable solution that enhances prediction accuracy, which can lead to better resource allocation and reduced costs.
Key Takeaways
- Switch-Hurdle integrates a Mixture-of-Experts encoder with a hurdle-based decoder.
- The model separates the forecasting task into binary classification and conditional regression components.
- Empirical results indicate state-of-the-art performance on benchmark datasets.
- The framework maintains scalability, making it suitable for large datasets.
- Addresses limitations of traditional forecasting methods for intermittent demand.
Computer Science > Machine Learning arXiv:2602.22685 (cs) [Submitted on 26 Feb 2026] Title:Switch-Hurdle: A MoE Encoder with AR Hurdle Decoder for Intermittent Demand Forecasting Authors:Fabian Muşat, Simona Căbuz View a PDF of the paper titled Switch-Hurdle: A MoE Encoder with AR Hurdle Decoder for Intermittent Demand Forecasting, by Fabian Mu\c{s}at and 1 other authors View PDF HTML (experimental) Abstract:Intermittent demand, a pattern characterized by long sequences of zero sales punctuated by sporadic, non-zero values, poses a persistent challenge in retail and supply chain forecasting. Both traditional methods, such as ARIMA, exponential smoothing, or Croston variants, as well as modern neural architectures such as DeepAR and Transformer-based models often underperform on such data, as they treat demand as a single continuous process or become computationally expensive when scaled across many sparse series. To address these limitations, we introduce Switch-Hurdle: a new framework that integrates a Mixture-of-Experts (MoE) encoder with a Hurdle-based probabilistic decoder. The encoder uses a sparse Top-1 expert routing during the forward pass yet approximately dense in the backward pass via a straight-through estimator (STE). The decoder follows a cross-attention autoregressive design with a shared hurdle head that explicitly separates the forecasting task into two components: a binary classification component estimating the probability of a sale, and a conditional re...