[2512.06227] Automated Data Enrichment using Confidence-Aware Fine-Grained Debate among Open-Source LLMs for Mental Health and Online Safety
About this article
Abstract page for arXiv paper 2512.06227: Automated Data Enrichment using Confidence-Aware Fine-Grained Debate among Open-Source LLMs for Mental Health and Online Safety
Computer Science > Computation and Language arXiv:2512.06227 (cs) [Submitted on 6 Dec 2025 (v1), last revised 3 Mar 2026 (this version, v2)] Title:Automated Data Enrichment using Confidence-Aware Fine-Grained Debate among Open-Source LLMs for Mental Health and Online Safety Authors:Junyu Mao, Anthony Hills, Talia Tseriotou, Maria Liakata, Aya Shamir, Dan Sayda, Dana Atzil-Slonim, Natalie Djohari, Arpan Mandal, Silke Roth, Pamela Ugwudike, Mahesan Niranjan, Stuart E. Middleton View a PDF of the paper titled Automated Data Enrichment using Confidence-Aware Fine-Grained Debate among Open-Source LLMs for Mental Health and Online Safety, by Junyu Mao and 12 other authors View PDF HTML (experimental) Abstract:Real-world indicators play an important role in many natural language processing (NLP) applications, such as life-event for mental health analysis and risky behaviour for online safety, yet labelling such information in training datasets is often costly and/or difficult due to their dynamic nature. Large language models (LLMs) show promising potential for automated annotation, yet multi-label prediction remains challenging. In this work, we propose a Confidence-Aware Fine-Grained Debate (CFD) framework that simulates collaborative annotation using fine-grained information to better support automated multi-label enrichment. We introduce two new expert-annotated resources: A mental health Reddit well-being dataset and an online safety Facebook sharenting risk dataset. Experim...