[2604.08578] Structured Exploration and Exploitation of Label Functions for Automated Data Annotation
About this article
Abstract page for arXiv paper 2604.08578: Structured Exploration and Exploitation of Label Functions for Automated Data Annotation
Computer Science > Machine Learning arXiv:2604.08578 (cs) [Submitted on 28 Mar 2026] Title:Structured Exploration and Exploitation of Label Functions for Automated Data Annotation Authors:Phong Lam, Ha-Linh Nguyen, Thu-Trang Nguyen, Son Nguyen, Hieu Dinh Vo View a PDF of the paper titled Structured Exploration and Exploitation of Label Functions for Automated Data Annotation, by Phong Lam and 4 other authors View PDF HTML (experimental) Abstract:High-quality labeled data is critical for training reliable machine learning and deep learning models, yet manual annotation remains costly and error-prone. Programmatic labeling addresses this challenge by using label functions (LFs), i.e., heuristic rules that automatically generate weak labels for training datasets. However, existing automated LF generation methods either rely on large language models (LLMs) to synthesize surface-level heuristics or employ model-based synthesis over hand-crafted primitives. These approaches often result in limited coverage and unreliable label quality. In this paper, we introduce EXPONA, an automated framework for programmatic labeling that formulates LF generation as a principled process balancing diversity and reliability. EXPONA systematically explores multi-level LFs, spanning surface, structural, and semantic perspectives. EXPONA further applies reliability-aware mechanisms to suppress noisy or redundant heuristics while preserving complementary signals. To evaluate EXPONA, we conducted ext...