[2602.17314] Open Datasets in Learning Analytics: Trends, Challenges, and Best PRACTICE
Summary
This article surveys open datasets in learning analytics, identifying trends, challenges, and best practices to enhance research reproducibility and collaboration.
Why It Matters
Open datasets are essential for advancing research in learning analytics, educational data mining, and AI in education. This study highlights the current landscape of dataset availability, revealing gaps and providing practical guidelines to encourage more researchers to share their data, ultimately improving educational outcomes.
Key Takeaways
- Identified 172 datasets from 1,125 papers in learning analytics.
- 143 datasets were previously unreported, highlighting a significant gap.
- Provides a checklist of best practices for researchers to publish their data.
Computer Science > Computers and Society arXiv:2602.17314 (cs) [Submitted on 19 Feb 2026] Title:Open Datasets in Learning Analytics: Trends, Challenges, and Best PRACTICE Authors:Valdemar Švábenský, Brendan Flanagan, Erwin Daniel López Zapata, Atsushi Shimada View a PDF of the paper titled Open Datasets in Learning Analytics: Trends, Challenges, and Best PRACTICE, by Valdemar \v{S}v\'abensk\'y and 3 other authors View PDF HTML (experimental) Abstract:Open datasets play a crucial role in three research domains that intersect data science and education: learning analytics, educational data mining, and artificial intelligence in education. Researchers in these domains apply computational methods to analyze data from educational contexts, aiming to better understand and improve teaching and learning. Providing open datasets alongside research papers supports reproducibility, collaboration, and trust in research findings. It also provides individual benefits for authors, such as greater visibility, credibility, and citation potential. Despite these advantages, the availability of open datasets and the associated practices within the learning analytics research communities, especially at their flagship conference venues, remain unclear. We surveyed available datasets published alongside research papers in learning analytics. We manually examined 1,125 papers from three flagship conferences (LAK, EDM, and AIED) over the past five years. We discovered, categorized, and analyzed 17...