[2509.22295] Aurora: Towards Universal Generative Multimodal Time Series Forecasting
Summary
Aurora introduces a Multimodal Time Series Foundation Model that enhances cross-domain generalization in time series forecasting by integrating multimodal inputs and supporting zero-shot inference.
Why It Matters
This research addresses a significant gap in time series forecasting by enabling models to utilize diverse data types (text, images) effectively, leading to improved predictive performance across different domains. The implications for industries relying on accurate forecasting are substantial, as it can enhance decision-making processes.
Key Takeaways
- Aurora supports multimodal inputs for better time series forecasting.
- The model enables zero-shot inference, enhancing its applicability across domains.
- Comprehensive experiments show Aurora achieves state-of-the-art performance on multiple benchmarks.
- The approach utilizes Modality-Guided Multi-head Self-Attention for effective feature extraction.
- Aurora's design addresses limitations of existing unimodal and end-to-end multimodal models.
Computer Science > Machine Learning arXiv:2509.22295 (cs) [Submitted on 26 Sep 2025 (v1), last revised 23 Feb 2026 (this version, v5)] Title:Aurora: Towards Universal Generative Multimodal Time Series Forecasting Authors:Xingjian Wu, Jianxin Jin, Wanghui Qiu, Peng Chen, Yang Shu, Bin Yang, Chenjuan Guo View a PDF of the paper titled Aurora: Towards Universal Generative Multimodal Time Series Forecasting, by Xingjian Wu and 6 other authors View PDF HTML (experimental) Abstract:Cross-domain generalization is very important in Time Series Forecasting because similar historical information may lead to distinct future trends due to the domain-specific characteristics. Recent works focus on building unimodal time series foundation models and end-to-end multimodal supervised models. Since domain-specific knowledge is often contained in modalities like texts, the former lacks the explicit utilization of them, thus hindering the performance. The latter is tailored for end-to-end scenarios and does not support zero-shot inference for cross-domain scenarios. In this work, we introduce Aurora, a Multimodal Time Series Foundation Model, which supports multimodal inputs and zero-shot inference. Pretrained on Cross-domain Multimodal Time Series Corpus, Aurora can adaptively extract and focus on key domain knowledge contained in corresponding text or image modalities, thus possessing strong cross-domain generalization capability. Through tokenization, encoding, and distillation, Aurora ca...