[2602.18372] "How Do I ...?": Procedural Questions Predominate Student-LLM Chatbot Conversations
Summary
This paper investigates the predominance of procedural questions in student interactions with LLM chatbots, analyzing data from various learning contexts to assess pedagogical effectiveness and classification challenges.
Why It Matters
Understanding how students interact with LLM chatbots can inform the design of educational tools, enhancing their effectiveness in supporting learning. The findings highlight the need for improved classification methods to capture the complexity of student inquiries, which is crucial for developing better AI-driven educational resources.
Key Takeaways
- Procedural questions are the most common type of inquiry from students using LLM chatbots.
- The study analyzed over 6,000 messages across different learning contexts, revealing patterns in student questioning.
- LLMs showed moderate-to-good reliability in classifying student questions compared to human raters.
- Current classification schemas are limited and may not fully capture the nuances of student inquiries.
- Future research should focus on multi-turn conversation analysis to better understand student interactions.
Computer Science > Human-Computer Interaction arXiv:2602.18372 (cs) [Submitted on 20 Feb 2026] Title:"How Do I ...?": Procedural Questions Predominate Student-LLM Chatbot Conversations Authors:Alexandra Neagu, Marcus Messer, Peter Johnson, Rhodri Nelson View a PDF of the paper titled "How Do I ...?": Procedural Questions Predominate Student-LLM Chatbot Conversations, by Alexandra Neagu and 3 other authors View PDF HTML (experimental) Abstract:Providing scaffolding through educational chatbots built on Large Language Models (LLM) has potential risks and benefits that remain an open area of research. When students navigate impasses, they ask for help by formulating impasse-driven questions. Within interactions with LLM chatbots, such questions shape the user prompts and drive the pedagogical effectiveness of the chatbot's response. This paper focuses on such student questions from two datasets of distinct learning contexts: formative self-study, and summative assessed coursework. We analysed 6,113 messages from both learning contexts, using 11 different LLMs and three human raters to classify student questions using four existing schemas. On the feasibility of using LLMs as raters, results showed moderate-to-good inter-rater reliability, with higher consistency than human raters. The data showed that 'procedural' questions predominated in both learning contexts, but more so when students prepare for summative assessment. These results provide a basis on which to use LLMs for...