[2602.15968] From Reflection to Repair: A Scoping Review of Dataset Documentation Tools
Summary
This article presents a scoping review of dataset documentation tools, analyzing motivations behind their design and factors affecting their adoption, ultimately proposing a shift towards institutional solutions for sustainable practices.
Why It Matters
As dataset documentation is crucial for responsible AI development, understanding the barriers to effective documentation tool adoption can help improve practices in the field. This review highlights persistent issues that need addressing to enhance the integration of documentation in automated systems.
Key Takeaways
- Identifies four key barriers to effective dataset documentation: unclear value, decontextualized designs, labor demands, and future integration challenges.
- Advocates for a shift in focus from individual to institutional solutions in documentation tool design.
- Calls for the HCI community to take actionable steps to support sustainable documentation practices.
Computer Science > Software Engineering arXiv:2602.15968 (cs) [Submitted on 17 Feb 2026] Title:From Reflection to Repair: A Scoping Review of Dataset Documentation Tools Authors:Pedro Reynolds-Cuéllar (Robotics and AI Institute), Marisol Wong-Villacres (Escuela Superior Politécnica del Litoral), Adriana Alvarado Garcia (IBM Research), Heila Precel (Robotics and AI Institute) View a PDF of the paper titled From Reflection to Repair: A Scoping Review of Dataset Documentation Tools, by Pedro Reynolds-Cu\'ellar (Robotics and AI Institute) and 2 other authors View PDF HTML (experimental) Abstract:Dataset documentation is widely recognized as essential for the responsible development of automated systems. Despite growing efforts to support documentation through different kinds of artifacts, little is known about the motivations shaping documentation tool design or the factors hindering their adoption. We present a systematic review supported by mixed-methods analysis of 59 dataset documentation publications to examine the motivations behind building documentation tools, how authors conceptualize documentation practices, and how these tools connect to existing systems, regulations, and cultural norms. Our analysis shows four persistent patterns in dataset documentation conceptualization that potentially impede adoption and standardization: unclear operationalizations of documentation's value, decontextualized designs, unaddressed labor demands, and a tendency to treat integration...