Llms Machine Learning Ai Infrastructure Ai Agents

How do you test AI agents in production? The unpredictability is overwhelming.[D]

Reddit - Machine Learning April 27, 2026 1 min read

About this article

I’ve been in QA for almost a decade. My mental model for quality was always: given input X, assert output Y. Now I’m on a team that’s shipping an LLM-based agent that handles multi-step tasks. I genuinely do not know how to test this in a way that feels rigorous. The thing works. But the output isn’t deterministic. The same input can produce different reasoning chains across runs. Hell even with temp=0 I see variation in tool selection and intermediate steps. My normal instincts don’t map her...

You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket

Originally published on April 27, 2026. Curated by AI News.

Read Original Article

Llms

Things I got wrong building a confidence evaluator for local LLMs [D]

I've been building **Autodidact**, a local-first AI agent framework. The central piece is a **confidence evaluator** - something that dec...

Reddit - Machine Learning · 1 min · 11 minutes ago

Llms

I’m convinced 90% of you building "AI Agents" are just burning money on proxy providers. [D]

Seriously, I just audited my stack and realized I’m spending more on rotating residential proxies than I am on the actual Claude and Open...

Reddit - Machine Learning · 1 min · 11 minutes ago

Llms

Confusing Website

i'm trying to find a video online and couldn't so i asked ChatGPT by describing the video and i was given a link and i'm trying to make s...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

I tested the same prompt across multiple AI models… the differences surprised me

I’ve been experimenting with different AI models lately (ChatGPT, Claude, etc.), and I tried something simple: Using the exact same promp...

How do you test AI agents in production? The unpredictability is overwhelming.[D]

About this article

Related Articles

Things I got wrong building a confidence evaluator for local LLMs [D]

I’m convinced 90% of you building "AI Agents" are just burning money on proxy providers. [D]

Confusing Website

I tested the same prompt across multiple AI models… the differences surprised me

No comments

Stay updated with AI News