[2602.17314] Open Datasets in Learning Analytics: Trends, Challenges, and Best PRACTICE

arXiv - Machine Learning February 20, 2026 4 min read Article

Summary

This article surveys open datasets in learning analytics, identifying trends, challenges, and best practices to enhance research reproducibility and collaboration.

Why It Matters

Open datasets are essential for advancing research in learning analytics, educational data mining, and AI in education. This study highlights the current landscape of dataset availability, revealing gaps and providing practical guidelines to encourage more researchers to share their data, ultimately improving educational outcomes.

Key Takeaways

Identified 172 datasets from 1,125 papers in learning analytics.
143 datasets were previously unreported, highlighting a significant gap.
Provides a checklist of best practices for researchers to publish their data.

Computer Science > Computers and Society arXiv:2602.17314 (cs) [Submitted on 19 Feb 2026] Title:Open Datasets in Learning Analytics: Trends, Challenges, and Best PRACTICE Authors:Valdemar Švábenský, Brendan Flanagan, Erwin Daniel López Zapata, Atsushi Shimada View a PDF of the paper titled Open Datasets in Learning Analytics: Trends, Challenges, and Best PRACTICE, by Valdemar \v{S}v\'abensk\'y and 3 other authors View PDF HTML (experimental) Abstract:Open datasets play a crucial role in three research domains that intersect data science and education: learning analytics, educational data mining, and artificial intelligence in education. Researchers in these domains apply computational methods to analyze data from educational contexts, aiming to better understand and improve teaching and learning. Providing open datasets alongside research papers supports reproducibility, collaboration, and trust in research findings. It also provides individual benefits for authors, such as greater visibility, credibility, and citation potential. Despite these advantages, the availability of open datasets and the associated practices within the learning analytics research communities, especially at their flagship conference venues, remain unclear. We surveyed available datasets published alongside research papers in learning analytics. We manually examined 1,125 papers from three flagship conferences (LAK, EDM, and AIED) over the past five years. We discovered, categorized, and analyzed 17...

Read Original Article

Llms

[R] Hybrid attention for small code models: 50x faster inference, but data scaling still dominates

TLDR: Forked pytorch and triton internals . Changed attention so its linear first layer , middle quadratic layer, last linear layer Infer...

Reddit - Machine Learning · 1 min · 32 minutes ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · 35 minutes ago

Machine Learning

Google quietly launched an AI dictation app that works offline

Google's new offline-first dictation app uses Gemma AI models to take on the apps like Wispr Flow.

TechCrunch - AI · 4 min · about 5 hours ago

Llms

[D] Tested model routing on financial AI datasets — good savings and curious what benchmarks others use.

Ran a benchmark evaluating whether prompt complexity-based routing delivers meaningful savings. Used public HuggingFace datasets. Here's ...

Reddit - Machine Learning · 1 min · about 8 hours ago