[2602.19661] PaReGTA: An LLM-based EHR Data Encoding Approach to Capture Temporal Information

[2602.19661] PaReGTA: An LLM-based EHR Data Encoding Approach to Capture Temporal Information

arXiv - Machine Learning 4 min read Article

Summary

The paper presents PaReGTA, an LLM-based framework for encoding temporal information in electronic health records (EHRs), enhancing patient representation and classification accuracy.

Why It Matters

This research addresses the challenge of capturing temporal data in EHRs, which is crucial for improving patient care and outcomes. By utilizing a lightweight, pre-trained LLM approach, PaReGTA offers a scalable solution that can be applied to various healthcare datasets, making it relevant for researchers and practitioners in the field of health informatics.

Key Takeaways

  • PaReGTA encodes longitudinal EHR events into structured templates with temporal cues.
  • The framework uses lightweight contrastive fine-tuning for domain-adapted embeddings.
  • It aggregates visit embeddings to create a fixed-dimensional patient representation.
  • PaReGTA outperforms traditional sparse models in migraine classification tasks.
  • The model is designed to be agnostic and can leverage future EHR-specialized models.

Computer Science > Machine Learning arXiv:2602.19661 (cs) [Submitted on 23 Feb 2026] Title:PaReGTA: An LLM-based EHR Data Encoding Approach to Capture Temporal Information Authors:Kihyuk Yoon, Lingchao Mao, Catherine Chong, Todd J. Schwedt, Chia-Chun Chiang, Jing Li View a PDF of the paper titled PaReGTA: An LLM-based EHR Data Encoding Approach to Capture Temporal Information, by Kihyuk Yoon and 5 other authors View PDF HTML (experimental) Abstract:Temporal information in structured electronic health records (EHRs) is often lost in sparse one-hot or count-based representations, while sequence models can be costly and data-hungry. We propose PaReGTA, an LLM-based encoding framework that (i) converts longitudinal EHR events into visit-level templated text with explicit temporal cues, (ii) learns domain-adapted visit embeddings via lightweight contrastive fine-tuning of a sentence-embedding model, and (iii) aggregates visit embeddings into a fixed-dimensional patient representation using hybrid temporal pooling that captures both recency and globally informative visits. Because PaReGTA does not require training from scratch but instead utilizes a pre-trained LLM, it can perform well even in data-limited cohorts. Furthermore, PaReGTA is model-agnostic and can benefit from future EHR-specialized sentence-embedding models. For interpretability, we introduce PaReGTA-RSS (Representation Shift Score), which quantifies clinically defined factor importance by recomputing representati...

Related Articles

Llms

[P] Remote sensing foundation models made easy to use.

This project enables the idea of tasking remote sensing models to acquire embeddings like we task satellites to acquire data! https://git...

Reddit - Machine Learning · 1 min ·
Llms

I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week

submitted by /u/ThereWas [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Llms

What features do you actually want in an AI chatbot that nobody has built yet?

Hey everyone 👋 I'm building a new AI chat app and before I build anything I want to hear from real users first. Current AI tools like Cha...

Reddit - Artificial Intelligence · 1 min ·
Llms

So, what exactly is going on with the Claude usage limits?

I'm extremely new to AI and am building a local agent for fun. I purchased a Claude Pro account because it helped me a lot in the past wh...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime