[2602.16626] A Systematic Evaluation of Sample-Level Tokenization Strategies for MEG Foundation Models

[2602.16626] A Systematic Evaluation of Sample-Level Tokenization Strategies for MEG Foundation Models

arXiv - AI 4 min read Article

Summary

This article evaluates sample-level tokenization strategies for MEG foundation models, comparing learnable and non-learnable approaches to enhance neuroimaging data analysis.

Why It Matters

Understanding tokenization strategies is crucial for improving the performance of large-scale neuroimaging models. This research provides insights into how different tokenization methods affect data fidelity and modeling outcomes, which can inform future developments in neuroimaging and machine learning.

Key Takeaways

  • Both learnable and non-learnable tokenization methods show high reconstruction accuracy.
  • Simple fixed sample-level tokenization can be effective for developing neural foundation models.
  • The study uses diverse MEG datasets to validate the findings across different conditions.
  • A novel autoencoder-based approach for learnable tokenization is introduced.
  • Results indicate comparable performance across various evaluation criteria for both tokenization strategies.

Computer Science > Machine Learning arXiv:2602.16626 (cs) [Submitted on 18 Feb 2026] Title:A Systematic Evaluation of Sample-Level Tokenization Strategies for MEG Foundation Models Authors:SungJun Cho, Chetan Gohil, Rukuang Huang, Oiwi Parker Jones, Mark W. Woolrich View a PDF of the paper titled A Systematic Evaluation of Sample-Level Tokenization Strategies for MEG Foundation Models, by SungJun Cho and 4 other authors View PDF HTML (experimental) Abstract:Recent success in natural language processing has motivated growing interest in large-scale foundation models for neuroimaging data. Such models often require discretization of continuous neural time series data, a process referred to as 'tokenization'. However, the impact of different tokenization strategies for neural data is currently poorly understood. In this work, we present a systematic evaluation of sample-level tokenization strategies for transformer-based large neuroimaging models (LNMs) applied to magnetoencephalography (MEG) data. We compare learnable and non-learnable tokenizers by examining their signal reconstruction fidelity and their impact on subsequent foundation modeling performance (token prediction, biological plausibility of generated data, preservation of subject-specific information, and performance on downstream tasks). For the learnable tokenizer, we introduce a novel approach based on an autoencoder. Experiments were conducted on three publicly available MEG datasets spanning different acquis...

Related Articles

Llms

What if Claude purposefully made its own code leakable so that it would get leaked

What if Claude leaked itself by socially and architecturally engineering itself to be leaked by a dumb human submitted by /u/smurfcsgoawp...

Reddit - Artificial Intelligence · 1 min ·
Llms

Observer-Embedded Reality

Observer-Embedded Reality Consciousness, Complexity, Meaning, and the Limits of Human Knowledge A Conceptual Philosophy-of-Science Paper ...

Reddit - Artificial Intelligence · 1 min ·
Llms

I think we’re about to have a new kind of “SEO”… and nobody is talking about it.

More people are asking ChatGPT things like: “what’s the best CRM?” “is this tool worth it?” “alternatives to X” And they just… trust the ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Why would Claude give me the same response over and over and give others different replies?

I asked Claude to "generate me a random word" so I could do some word play. Then I asked it again in a new prompt window on desktop after...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime