[2602.18520] Sketch2Feedback: Grammar-in-the-Loop Framework for Rubric-Aligned Feedback on Student STEM Diagrams

[2602.18520] Sketch2Feedback: Grammar-in-the-Loop Framework for Rubric-Aligned Feedback on Student STEM Diagrams

arXiv - AI 4 min read Article

Summary

The paper presents Sketch2Feedback, a framework that enhances feedback on student-drawn STEM diagrams by integrating grammar rules to reduce hallucination in AI models.

Why It Matters

This research addresses a critical challenge in STEM education by improving the reliability of AI-generated feedback on student diagrams. The framework's ability to minimize hallucination rates while providing actionable insights can significantly enhance learning outcomes and trust in AI tools in educational settings.

Key Takeaways

  • Sketch2Feedback employs a grammar-in-the-loop approach to enhance feedback accuracy.
  • The framework reduces hallucination rates in AI feedback, improving trustworthiness.
  • Evaluation shows that the grammar-based method provides more actionable insights than traditional end-to-end models.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.18520 (cs) [Submitted on 19 Feb 2026] Title:Sketch2Feedback: Grammar-in-the-Loop Framework for Rubric-Aligned Feedback on Student STEM Diagrams Authors:Aayam Bansal View a PDF of the paper titled Sketch2Feedback: Grammar-in-the-Loop Framework for Rubric-Aligned Feedback on Student STEM Diagrams, by Aayam Bansal View PDF HTML (experimental) Abstract:Providing timely, rubric-aligned feedback on student-drawn diagrams is a persistent challenge in STEM education. While large multimodal models (LMMs) can jointly parse images and generate explanations, their tendency to hallucinate undermines trust in classroom deployments. We present Sketch2Feedback, a grammar-in-the-loop framework that decomposes the problem into four stages -- hybrid perception, symbolic graph construction, constraint checking, and constrained VLM feedback -- so that the language model verbalizes only violations verified by an upstream rule engine. We evaluate on two synthetic micro-benchmarks, FBD-10 (free-body diagrams) and Circuit-10 (circuit schematics), each with 500 images spanning standard and hard noise augmentation tiers, comparing our pipeline against end-to-end LMMs (LLaVA-1.5-7B, Qwen2-VL-7B), a vision-only detector, a YOLOv8-nano learned detector, and an ensemble oracle. On n=100 test samples per benchmark with 95% bootstrap CIs, results are mixed and instructive: Qwen2-VL-7B achieves the highest micro-F1 on both FBDs (0.570) a...

Related Articles

AI chip startup Rebellions raises $400 million at $2.3B valuation in pre-IPO round | TechCrunch
Machine Learning

AI chip startup Rebellions raises $400 million at $2.3B valuation in pre-IPO round | TechCrunch

The startup, which is planning to go public later this year, designs chips specifically for AI inference, another challenger to Nvidia's ...

TechCrunch - AI · 4 min ·
Llms

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Google AI (gai.google) gives Gemini-powered answers for technical queries — think AI-enhanced search with code understanding. I built a C...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

Big increase in the amount of people using AI to write their replies with AI

I find it interesting that we’ve all randomly decided to use the “-“ more often recently on reddit, and everyone’s grammar has drasticall...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] MXFP8 GEMM: Up to 99% of cuBLAS performance using CUDA + PTX

New blog post by Daniel Vega-Myhre (Meta/PyTorch) illustrating GEMM design for FP8, including deep-dives into all the constraints and des...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime