[2602.18518] Measuring the Prevalence of Policy Violating Content with ML Assisted Sampling and LLM Labeling
Summary
This paper presents a novel measurement system for assessing the prevalence of policy-violating content using ML-assisted sampling and LLM labeling, addressing challenges in accuracy and efficiency.
Why It Matters
Understanding the prevalence of policy-violating content is crucial for content safety teams to ensure user safety and compliance. This study introduces a systematic approach that leverages machine learning to improve measurement accuracy and efficiency, which is vital for platforms managing user-generated content.
Key Takeaways
- The study proposes a design-based measurement system for policy violations.
- ML-assisted sampling focuses resources on high-risk content for more accurate prevalence estimates.
- The system supports multiple analytical pivots, enhancing its utility across various content segments.
Computer Science > Machine Learning arXiv:2602.18518 (cs) [Submitted on 19 Feb 2026] Title:Measuring the Prevalence of Policy Violating Content with ML Assisted Sampling and LLM Labeling Authors:Attila Dobi, Aravindh Manickavasagam, Benjamin Thompson, Xiaohan Yang, Faisal Farooq View a PDF of the paper titled Measuring the Prevalence of Policy Violating Content with ML Assisted Sampling and LLM Labeling, by Attila Dobi and 4 other authors View PDF HTML (experimental) Abstract:Content safety teams need metrics that reflect what users actually experience, not only what is reported. We study prevalence: the fraction of user views (impressions) that went to content violating a given policy on a given day. Accurate prevalence measurement is challenging because violations are often rare and human labeling is costly, making frequent, platform-representative studies slow. We present a design-based measurement system that (i) draws daily probability samples from the impression stream using ML-assisted weights to concentrate label budget on high-exposure and high-risk content while preserving unbiasedness, (ii) labels sampled items with a multimodal LLM governed by policy prompts and gold-set validation, and (iii) produces design-consistent prevalence estimates with confidence intervals and dashboard drilldowns. A key design goal is one global sample with many pivots: the same daily sample supports prevalence by surface, viewer geography, content age, and other segments through post...