[2601.08950] ConvoLearn: A Dataset for Fine-Tuning Dialogic AI Tutors
About this article
Abstract page for arXiv paper 2601.08950: ConvoLearn: A Dataset for Fine-Tuning Dialogic AI Tutors
Computer Science > Artificial Intelligence arXiv:2601.08950 (cs) [Submitted on 13 Jan 2026 (v1), last revised 6 Apr 2026 (this version, v2)] Title:ConvoLearn: A Dataset for Fine-Tuning Dialogic AI Tutors Authors:Mayank Sharma, Roy Pea, Hari Subramonyam View a PDF of the paper titled ConvoLearn: A Dataset for Fine-Tuning Dialogic AI Tutors, by Mayank Sharma and 2 other authors View PDF HTML (experimental) Abstract:Despite their growing adoption in education, LLMs remain misaligned with the core principle of effective tutoring: the dialogic construction of knowledge. We introduce ConvoLearn, a dataset of 2,134 semi-synthetic tutor-student dialogues operationalizing six dimensions of dialogic tutoring grounded in knowledge-building theory, situated in middle school Earth Science curriculum. We show that dimension-labeled dialogic training data captures meaningful pedagogical signal that generalizes beyond its semi-synthetic domain: scores from a classifier trained on ConvoLearn correlate significantly with expert-coded instructional quality in authentic classrooms across multiple subscales (range r = .118-.258, all p < .05). As a proof of concept, we fine-tune Mistral-7B on ConvoLearn and show that dimension-level fine-tuning can steer a 7B open-weight model toward dialogic tutoring behavior that credentialed teachers rate as competitive with a strong proprietary baseline. With this work, we support the development of AI tutors capable of more dialogic interactions. Subjects:...