[2602.15084] TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics
Summary
TokaMind is a new open-source multi-modal transformer model designed for tokamak plasma dynamics, demonstrating superior performance on fusion plasma modeling tasks.
Why It Matters
This research presents a significant advancement in fusion plasma modeling by leveraging multi-modal data and transformer architectures, which can enhance predictive capabilities in plasma physics. The open-source nature of TokaMind encourages collaboration and further innovation in the field.
Key Takeaways
- TokaMind utilizes a multi-modal transformer framework for improved plasma modeling.
- The model supports various data types, enhancing its adaptability to different tasks.
- Fine-tuning TokaMind yields better results than training from scratch in many scenarios.
- The research highlights the importance of multi-modal pretraining in achieving superior performance.
- Training code and model weights will be publicly available, promoting further research.
Physics > Plasma Physics arXiv:2602.15084 (physics) [Submitted on 16 Feb 2026] Title:TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics Authors:Tobia Boschi, Andrea Loreti, Nicola C. Amorisco, Rodrigo H. Ordonez-Hurtado, Cécile Rousseau, George K. Holt, Eszter Székely, Alexander Whittle, Samuel Jackson, Adriano Agnello, Stanislas Pamela, Alessandra Pascale, Robert Akers, Juan Bernabe Moreno, Vassil Alexandrov, Mykhaylo Zayats View a PDF of the paper titled TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics, by Tobia Boschi and 15 other authors View PDF HTML (experimental) Abstract:We present TokaMind, an open-source foundation model framework for fusion plasma modeling, based on a Multi-Modal Transformer (MMT) and trained on heterogeneous tokamak diagnostics from the publicly available MAST dataset. TokaMind supports multiple data modalities (time-series, 2D profiles, and videos) with different sampling rates, robust missing-signal handling, and efficient task adaptation via selectively loading and freezing four model components. To represent multi-modal signals, we use a training-free Discrete Cosine Transform embedding (DCT3D) and provide a clean interface for alternative embeddings (e.g., Variational Autoencoders - VAEs). We evaluate TokaMind on the recently introduced MAST benchmark TokaMark, comparing training and embedding strategies. Our results show that fine-tuned TokaMind outperforms the benchmark baseli...