[2510.15297] VERA-MH Concept Paper
Summary
The VERA-MH Concept Paper outlines an innovative framework for evaluating AI chatbots in mental health contexts, focusing on suicide risk assessment through automated simulations and scoring by AI agents.
Why It Matters
As AI chatbots become increasingly integrated into mental health care, ensuring their safety and effectiveness is crucial. VERA-MH provides a structured approach to evaluate these tools, addressing ethical concerns and enhancing patient care.
Key Takeaways
- VERA-MH automates the evaluation of AI chatbots for mental health applications.
- The framework uses simulated user-agent interactions to assess chatbot responses.
- Initial evaluations have been conducted on prominent AI models like GPT-5 and Claude.
- The project seeks community feedback to refine its evaluation methods.
- Ongoing clinical validation is essential for ensuring the reliability of the evaluation.
Computer Science > Computers and Society arXiv:2510.15297 (cs) [Submitted on 17 Oct 2025 (v1), last revised 19 Feb 2026 (this version, v3)] Title:VERA-MH Concept Paper Authors:Luca Belli, Kate Bentley, Will Alexander, Emily Ward, Matt Hawrilenko, Kelly Johnston, Mill Brown, Adam Chekroud View a PDF of the paper titled VERA-MH Concept Paper, by Luca Belli and 7 other authors View PDF HTML (experimental) Abstract:We introduce VERA-MH (Validation of Ethical and Responsible AI in Mental Health), an automated evaluation of the safety of AI chatbots used in mental health contexts, with an initial focus on suicide risk. Practicing clinicians and academic experts developed a rubric informed by best practices for suicide risk management for the evaluation. To fully automate the process, we used two ancillary AI agents. A user-agent model simulates users engaging in a mental health-based conversation with the chatbot under evaluation. The user-agent role-plays specific personas with pre-defined risk levels and other features. Simulated conversations are then passed to a judge-agent who scores them based on the rubric. The final evaluation of the chatbot being tested is obtained by aggregating the scoring of each conversation. VERA-MH is actively under development and undergoing rigorous validation by mental health clinicians to ensure user-agents realistically act as patients and that the judge-agent accurately scores the AI chatbot. To date we have conducted preliminary evaluation ...