[2602.15844] Language Model Representations for Efficient Few-Shot Tabular Classification
Summary
This paper explores the use of language model representations for efficient few-shot classification of tabular data, proposing a new paradigm called TaRL that leverages existing LLMs for improved performance in low-data scenarios.
Why It Matters
As structured data in the form of tables becomes increasingly prevalent on the web, the ability to classify this data efficiently is crucial. This research highlights how existing language models can be adapted to enhance classification tasks, potentially streamlining processes in various applications such as e-commerce and scientific research.
Key Takeaways
- The TaRL paradigm utilizes semantic embeddings for few-shot tabular classification.
- Naive application of embeddings is less effective than specialized models, but can be improved with specific techniques.
- The proposed method achieves performance comparable to state-of-the-art models in low-data contexts.
- The research demonstrates the potential of reusing existing LLM infrastructure for better web table understanding.
- Calibrating softmax temperature and removing common components from embeddings are key to unlocking their potential.
Computer Science > Computation and Language arXiv:2602.15844 (cs) [Submitted on 21 Jan 2026] Title:Language Model Representations for Efficient Few-Shot Tabular Classification Authors:Inwon Kang, Parikshit Ram, Yi Zhou, Horst Samulowitz, Oshani Seneviratne View a PDF of the paper titled Language Model Representations for Efficient Few-Shot Tabular Classification, by Inwon Kang and 4 other authors View PDF HTML (experimental) Abstract:The Web is a rich source of structured data in the form of tables, from product catalogs and knowledge bases to scientific datasets. However, the heterogeneity of the structure and semantics of these tables makes it challenging to build a unified method that can effectively leverage the information they contain. Meanwhile, Large language models (LLMs) are becoming an increasingly integral component of web infrastructure for tasks like semantic search. This raises a crucial question: can we leverage these already-deployed LLMs to classify structured data in web-native tables (e.g., product catalogs, knowledge base exports, scientific data portals), avoiding the need for specialized models or extensive retraining? This work investigates a lightweight paradigm, $\textbf{Ta}$ble $\textbf{R}$epresentation with $\textbf{L}$anguage Model~($\textbf{TaRL}$), for few-shot tabular classification that directly utilizes semantic embeddings of individual table rows. We first show that naive application of these embeddings underperforms compared to specializ...