[2602.18451] Developing a Multi-Agent System to Generate Next Generation Science Assessments with Evidence-Centered Design
Summary
This article discusses the development of a Multi-Agent System (MAS) that automates the generation of science assessments aligned with the Next Generation Science Standards (NGSS) using Evidence-Centered Design (ECD).
Why It Matters
As education increasingly emphasizes performance-based assessments, this research addresses the challenges of creating high-quality, standards-aligned assessments. By integrating AI with ECD, the study explores a scalable solution that could enhance educational assessment practices while highlighting the importance of human expertise.
Key Takeaways
- The study integrates Evidence-Centered Design with Multi-Agent Systems for automated assessment generation.
- AI-generated assessment items show comparable quality to human-developed items in alignment with NGSS standards.
- AI excels in inclusivity but struggles with clarity and multimodal design.
- Both AI and human assessments have weaknesses in evidence collectability and student interest alignment.
- Human expertise remains crucial despite advancements in automated assessment generation.
Computer Science > Computers and Society arXiv:2602.18451 (cs) [Submitted on 3 Feb 2026] Title:Developing a Multi-Agent System to Generate Next Generation Science Assessments with Evidence-Centered Design Authors:Yaxuan Yang, Jongchan Park, Yifan Zhou, Xiaoming Zhai View a PDF of the paper titled Developing a Multi-Agent System to Generate Next Generation Science Assessments with Evidence-Centered Design, by Yaxuan Yang and 3 other authors View PDF HTML (experimental) Abstract:Contemporary science education reforms such as the Next Generation Science Standards (NGSS) demand assessments to understand students' ability to use science knowledge to solve problems and design solutions. To elicit such higher-order ability, educators need performance-based assessments, which are challenging to develop. One solution that has been broadly adopted is Evidence-Centered Design (ECD), which emphasizes interconnected models of the learner, evidence, and tasks. Although ECD provides a framework to safeguard assessment validity, its implementation requires diverse expertise (e.g., content and assessment), which is both costly and labor-intensive. To address this challenge, this study proposed integrating the ECD framework into Multi-Agent Systems (MAS) to generate NGSS-aligned assessment items automatically. This integrated MAS system ensembles multiple large language models with varying expertise, enabling the automation of complex, multi-stage item generation workflows traditionally per...