[2509.23415] From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents
About this article
Abstract page for arXiv paper 2509.23415: From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents
Computer Science > Artificial Intelligence arXiv:2509.23415 (cs) [Submitted on 27 Sep 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents Authors:Gyubok Lee, Woosog Chay, Heeyoung Kwak, Yeong Hwa Kim, Haanju Yoo, Oksoon Jeong, Meong Hi Son, Edward Choi View a PDF of the paper titled From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents, by Gyubok Lee and 7 other authors View PDF HTML (experimental) Abstract:Despite the impressive performance of LLM-powered agents, their adoption for Electronic Health Record (EHR) data access remains limited by the absence of benchmarks that adequately capture real-world clinical data access flows. In practice, two core challenges hinder deployment: query ambiguity from vague user questions and value mismatch between user terminology and database entries. To address this, we introduce EHR-ChatQA, an interactive database question answering benchmark that evaluates the end-to-end workflow of database agents: clarifying user questions, using tools to resolve value mismatches, and generating correct SQL to deliver accurate answers. To cover diverse patterns of query ambiguity and value mismatch, EHR-ChatQA assesses agents in a simulated environment with an LLM-based user across two interaction flows: Incremental Query Refinement (IncreQA), where users add constraints to existing queries, ...