[2507.16696] FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation
Summary
FISHER is a proposed foundation model aimed at improving the analysis of multi-modal industrial signals, addressing the challenges posed by signal heterogeneity in SCADA systems.
Why It Matters
As industries increasingly rely on SCADA systems for monitoring, the ability to effectively analyze diverse industrial signals is crucial for operational efficiency and safety. FISHER's unified approach could enhance signal representation and anomaly detection, potentially transforming industrial signal processing.
Key Takeaways
- FISHER addresses the M5 problem by unifying the modeling of diverse industrial signals.
- The model utilizes a teacher-student self-supervised learning framework for effective pre-training.
- FISHER demonstrates up to 4.2% performance improvement over existing SSL models.
- The RMIS benchmark is introduced to evaluate multi-modal industrial signals across various tasks.
- Both FISHER and RMIS are open-sourced, promoting further research and application.
Computer Science > Machine Learning arXiv:2507.16696 (cs) [Submitted on 22 Jul 2025 (v1), last revised 13 Feb 2026 (this version, v2)] Title:FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation Authors:Pingyi Fan, Anbai Jiang, Shuwei Zhang, Zhiqiang Lv, Bing Han, Xinhu Zheng, Wenrui Liang, Junjie Li, Wei-Qiang Zhang, Yanmin Qian, Xie Chen, Cheng Lu, Jia Liu View a PDF of the paper titled FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation, by Pingyi Fan and 11 other authors View PDF HTML (experimental) Abstract:With the rapid deployment of SCADA systems, how to effectively analyze industrial signals and detect abnormal states is an urgent need for the industry. Due to the significant heterogeneity of these signals, which we summarize as the M5 problem, previous works only focus on small sub-problems and employ specialized models, failing to utilize the synergies between modalities and the powerful scaling law. However, we argue that the M5 signals can be modeled in a unified manner due to the intrinsic similarity. As a result, we propose FISHER, a Foundation model for multi-modal Industrial Signal compreHEnsive Representation. To support arbitrary sampling rates, FISHER considers the increment of sampling rate as the concatenation of sub-band information. Specifically, FISHER takes the STFT sub-band as the modeling unit and adopts a teacher student SSL framework for pre-training. We also develop the...