[2603.13294] Real-World AI Evaluation: How FRAME Generates Systematic Evidence to Resolve the Decision-Maker's Dilemma
About this article
Abstract page for arXiv paper 2603.13294: Real-World AI Evaluation: How FRAME Generates Systematic Evidence to Resolve the Decision-Maker's Dilemma
Computer Science > Computers and Society arXiv:2603.13294 (cs) [Submitted on 28 Feb 2026 (v1), last revised 29 Mar 2026 (this version, v4)] Title:Real-World AI Evaluation: How FRAME Generates Systematic Evidence to Resolve the Decision-Maker's Dilemma Authors:Reva Schwartz, Gabriella Waters View a PDF of the paper titled Real-World AI Evaluation: How FRAME Generates Systematic Evidence to Resolve the Decision-Maker's Dilemma, by Reva Schwartz and Gabriella Waters View PDF Abstract:Organizational leaders are being asked to make high-stakes decisions about AI deployment without dependable evidence of what these systems actually do in the environments they oversee. The predominant AI evaluation ecosystem yields scalable but abstract metrics that reflect the priorities of model development. By smoothing over the heterogeneity of real-world use, these model-centric approaches obscure how behavior varies across users, workflows, and settings, and rarely show where risk and value accumulate in practice. More user-centric studies reveal rich contextual detail, yet are fragmented, small-scale and loosely coupled to the mechanisms that shape model behavior. The Forum for Real-World AI Measurement and Evaluation (FRAME) aims to address this gap by combining large-scale trials of AI systems with structured observation of how they are used in context, the outcomes they generate, and how those outcomes arise. By tracing the path from an AI system's output through its practical use and d...