How do you test AI agents in production? The unpredictability is overwhelming.[D]
About this article
I’ve been in QA for almost a decade. My mental model for quality was always: given input X, assert output Y. Now I’m on a team that’s shipping an LLM-based agent that handles multi-step tasks. I genuinely do not know how to test this in a way that feels rigorous. The thing works. But the output isn’t deterministic. The same input can produce different reasoning chains across runs. Hell even with temp=0 I see variation in tool selection and intermediate steps. My normal instincts don’t map her...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket