Llms Machine Learning Nlp Data Science

[2602.19661] PaReGTA: An LLM-based EHR Data Encoding Approach to Capture Temporal Information

arXiv - Machine Learning February 24, 2026 4 min read Article

Summary

The paper presents PaReGTA, an LLM-based framework for encoding temporal information in electronic health records (EHRs), enhancing patient representation and classification accuracy.

Why It Matters

This research addresses the challenge of capturing temporal data in EHRs, which is crucial for improving patient care and outcomes. By utilizing a lightweight, pre-trained LLM approach, PaReGTA offers a scalable solution that can be applied to various healthcare datasets, making it relevant for researchers and practitioners in the field of health informatics.

Key Takeaways

PaReGTA encodes longitudinal EHR events into structured templates with temporal cues.
The framework uses lightweight contrastive fine-tuning for domain-adapted embeddings.
It aggregates visit embeddings to create a fixed-dimensional patient representation.
PaReGTA outperforms traditional sparse models in migraine classification tasks.
The model is designed to be agnostic and can leverage future EHR-specialized models.

Computer Science > Machine Learning arXiv:2602.19661 (cs) [Submitted on 23 Feb 2026] Title:PaReGTA: An LLM-based EHR Data Encoding Approach to Capture Temporal Information Authors:Kihyuk Yoon, Lingchao Mao, Catherine Chong, Todd J. Schwedt, Chia-Chun Chiang, Jing Li View a PDF of the paper titled PaReGTA: An LLM-based EHR Data Encoding Approach to Capture Temporal Information, by Kihyuk Yoon and 5 other authors View PDF HTML (experimental) Abstract:Temporal information in structured electronic health records (EHRs) is often lost in sparse one-hot or count-based representations, while sequence models can be costly and data-hungry. We propose PaReGTA, an LLM-based encoding framework that (i) converts longitudinal EHR events into visit-level templated text with explicit temporal cues, (ii) learns domain-adapted visit embeddings via lightweight contrastive fine-tuning of a sentence-embedding model, and (iii) aggregates visit embeddings into a fixed-dimensional patient representation using hybrid temporal pooling that captures both recency and globally informative visits. Because PaReGTA does not require training from scratch but instead utilizes a pre-trained LLM, it can perform well even in data-limited cohorts. Furthermore, PaReGTA is model-agnostic and can benefit from future EHR-specialized sentence-embedding models. For interpretability, we introduce PaReGTA-RSS (Representation Shift Score), which quantifies clinically defined factor importance by recomputing representati...

Read Original Article

[2602.19661] PaReGTA: An LLM-based EHR Data Encoding Approach to Capture Temporal Information

Summary

Why It Matters

Key Takeaways

Related Articles

[P] Remote sensing foundation models made easy to use.

I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week

What features do you actually want in an AI chatbot that nobody has built yet?

So, what exactly is going on with the Claude usage limits?

No comments

Stay updated with AI News