[2603.11679] LLMs can construct powerful representations and streamline sample-efficient supervised learning
About this article
Abstract page for arXiv paper 2603.11679: LLMs can construct powerful representations and streamline sample-efficient supervised learning
Computer Science > Artificial Intelligence arXiv:2603.11679 (cs) [Submitted on 12 Mar 2026 (v1), last revised 21 Mar 2026 (this version, v2)] Title:LLMs can construct powerful representations and streamline sample-efficient supervised learning Authors:Ilker Demirel, Lawrence Shi, Zeshan Hussain, David Sontag View a PDF of the paper titled LLMs can construct powerful representations and streamline sample-efficient supervised learning, by Ilker Demirel and 3 other authors View PDF Abstract:As real-world datasets become increasingly complex and heterogeneous, supervised learning is often bottlenecked by input representation design. Modeling multimodal data for downstream tasks, such as time-series, free text, and structured records, often requires non-trivial domain-specific engineering. We propose an agentic pipeline to streamline this process. First, an LLM analyzes a small but diverse subset of text-serialized input examples in-context to synthesize a global rubric, which acts as a programmatic specification for extracting and organizing evidence. This rubric is then used to transform naive text-serializations of inputs into a more standardized format for downstream models. We also describe local rubrics, which are task-conditioned summaries generated by an LLM. Across 15 clinical tasks from the EHRSHOT benchmark, our rubric-based approaches significantly outperform traditional count-feature models, naive text-serialization-based LLM baselines, and a clinical foundation mo...