[2506.21138] Multi-Sample Prompting and Actor-Critic Prompt Optimization for Diverse Synthetic Data Generation
About this article
Abstract page for arXiv paper 2506.21138: Multi-Sample Prompting and Actor-Critic Prompt Optimization for Diverse Synthetic Data Generation
Computer Science > Software Engineering arXiv:2506.21138 (cs) [Submitted on 26 Jun 2025 (v1), last revised 28 Mar 2026 (this version, v2)] Title:Multi-Sample Prompting and Actor-Critic Prompt Optimization for Diverse Synthetic Data Generation Authors:Abdelkarim El-Hajjami, Camille Salinesi View a PDF of the paper titled Multi-Sample Prompting and Actor-Critic Prompt Optimization for Diverse Synthetic Data Generation, by Abdelkarim El-Hajjami and Camille Salinesi View PDF HTML (experimental) Abstract:High-quality labeled datasets are fundamental for training and evaluating machine learning models, yet domains such as healthcare and Requirements Engineering (RE) face persistent barriers due to data scarcity, privacy constraints, or proprietary restrictions. While Large Language Models (LLMs) offer a promising avenue for Synthetic Data Generation (SDG), LLM-generated data tends to be repetitive and low in diversity, reducing its effectiveness for downstream tasks. Two approaches show potential for addressing this limitation: (1) multi-sample prompting, which generates multiple samples per prompt to reduce repetition, and (2) Prompt with Actor-Critic Editing (PACE), which iteratively refines prompts to maximize diversity. We integrate both mechanisms into Synthline, a Feature Model-based configurable synthetic data generator, and assess their effects on diversity and downstream utility across four RE classification tasks. Multi-sample prompting consistently improves both diver...