[2603.04370] $τ$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

[2603.04370] $τ$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.04370: $τ$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

Computer Science > Artificial Intelligence arXiv:2603.04370 (cs) [Submitted on 4 Mar 2026] Title:$τ$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge Authors:Quan Shi, Alexandra Zytek, Pedram Razavi, Karthik Narasimhan, Victor Barres View a PDF of the paper titled $\tau$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge, by Quan Shi and 4 other authors View PDF HTML (experimental) Abstract:Conversational agents are increasingly deployed in knowledge-intensive settings, where correct behavior depends on retrieving and applying domain-specific knowledge from large, proprietary, and unstructured corpora during live interactions with users. Yet most existing benchmarks evaluate retrieval or tool use independently of each other, creating a gap in realistic, fully agentic evaluation over unstructured data in long-horizon interactions. We introduce $\tau$-Knowledge, an extension of $\tau$-Bench for evaluating agents in environments where success depends on coordinating external, natural-language knowledge with tool outputs to produce verifiable, policy-compliant state changes. Our new domain, $\tau$-Banking, models realistic fintech customer support workflows in which agents must navigate roughly 700 interconnected knowledge documents while executing tool-mediated account updates. Across embedding-based retrieval and terminal-based search, even frontier models with high reasoning budgets achieve only $\sim$25.5% pass^1, with reliabilit...

Originally published on March 05, 2026. Curated by AI News.

Related Articles

Machine Learning

VulcanAMI Might Help

I open-sourced a large AI platform I built solo, working 16 hours a day, at my kitchen table, fueled by an inordinate degree of compulsio...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[P] I tested Meta’s brain-response model on posts. It predicted the Elon one almost perfectly.

I built an experimental UI and visualization layer around Meta’s open brain-response model just to see whether this stuff actually works ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[R] First open-source implementation of Hebbian fast-weight write-back for the BDH architecture

The BDH (Dragon Hatchling) paper (arXiv:2509.26507) describes a Hebbian synaptic plasticity mechanism where model weights update during i...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Could really use some guidance . I'm a 2nd year Data Science UG Student

I'm currently finishing up my second year of a three year Bachelor of Data Science degree. I've got the basics down quite well, linear re...

Reddit - Machine Learning · 1 min ·
More in Nlp: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime