[2602.16914] A statistical perspective on transformers for small longitudinal cohort data
Summary
This paper presents a simplified transformer architecture tailored for small longitudinal cohort data, enhancing predictive performance while uncovering complex dependencies.
Why It Matters
Transformers have revolutionized data modeling in various fields, yet their application to small longitudinal datasets has been limited. This research addresses this gap, providing a method that could improve insights in fields like psychology and healthcare, where data may be scarce.
Key Takeaways
- Introduces a simplified transformer architecture for small datasets.
- Utilizes attention mechanisms to prioritize relevant historical data.
- Demonstrates competitive predictive performance in small longitudinal studies.
- Enables statistical testing for contextual pattern identification.
- Applies findings to real-world data, revealing dynamics in stress and mental health.
Statistics > Methodology arXiv:2602.16914 (stat) [Submitted on 18 Feb 2026] Title:A statistical perspective on transformers for small longitudinal cohort data Authors:Kiana Farhadyar, Maren Hackenberg, Kira Ahrens, Charlotte Schenk, Bianca Kollmann, Oliver Tüscher, Klaus Lieb, Michael M. Plichta, Andreas Reif, Raffael Kalisch, Martin Wolkewitz, Moritz Hess, Harald Binder View a PDF of the paper titled A statistical perspective on transformers for small longitudinal cohort data, by Kiana Farhadyar and 12 other authors View PDF HTML (experimental) Abstract:Modeling of longitudinal cohort data typically involves complex temporal dependencies between multiple variables. There, the transformer architecture, which has been highly successful in language and vision applications, allows us to account for the fact that the most recently observed time points in an individual's history may not always be the most important for the immediate future. This is achieved by assigning attention weights to observations of an individual based on a transformation of their values. One reason why these ideas have not yet been fully leveraged for longitudinal cohort data is that typically, large datasets are required. Therefore, we present a simplified transformer architecture that retains the core attention mechanism while reducing the number of parameters to be estimated, to be more suitable for small datasets with few time points. Guided by a statistical perspective on transformers, we use an au...