[2510.12728] Data-Prompt Co-Evolution: Growing Test Sets to Refine LLM Behavior
About this article
Abstract page for arXiv paper 2510.12728: Data-Prompt Co-Evolution: Growing Test Sets to Refine LLM Behavior
Computer Science > Human-Computer Interaction arXiv:2510.12728 (cs) [Submitted on 14 Oct 2025 (v1), last revised 24 Mar 2026 (this version, v3)] Title:Data-Prompt Co-Evolution: Growing Test Sets to Refine LLM Behavior Authors:Minjae Lee, Minsuk Kahng View a PDF of the paper titled Data-Prompt Co-Evolution: Growing Test Sets to Refine LLM Behavior, by Minjae Lee and 1 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) are increasingly embedded in applications, and people can shape model behavior by editing prompt instructions. Yet encoding subtle, domain-specific policies into prompts is challenging. Although this process often benefits from concrete test cases, test data and prompt instructions are typically developed as separate artifacts, reflecting traditional machine learning practices in which model tuning was slow and test sets were static. We argue that the fast, iterative nature of prompt engineering calls for removing this separation and enabling a new workflow: data-prompt co-evolution, where a living test set and prompt instructions evolve in tandem. We present an interactive system that operationalizes this workflow. It guides application developers to discover edge cases, articulate rationales for desired behavior, and iteratively evaluate revised prompts against a growing test set. A user study shows our workflow helps people refine prompts systematically, better aligning them with their intended policies. This work points toward...