[2603.04370] $τ$-Knowledge: Evaluating Conversational Agents over

[2603.04370] $τ$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

arXiv - AI March 05, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.04370: $τ$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

Computer Science > Artificial Intelligence arXiv:2603.04370 (cs) [Submitted on 4 Mar 2026] Title:$τ$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge Authors:Quan Shi, Alexandra Zytek, Pedram Razavi, Karthik Narasimhan, Victor Barres View a PDF of the paper titled $\tau$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge, by Quan Shi and 4 other authors View PDF HTML (experimental) Abstract:Conversational agents are increasingly deployed in knowledge-intensive settings, where correct behavior depends on retrieving and applying domain-specific knowledge from large, proprietary, and unstructured corpora during live interactions with users. Yet most existing benchmarks evaluate retrieval or tool use independently of each other, creating a gap in realistic, fully agentic evaluation over unstructured data in long-horizon interactions. We introduce $\tau$-Knowledge, an extension of $\tau$-Bench for evaluating agents in environments where success depends on coordinating external, natural-language knowledge with tool outputs to produce verifiable, policy-compliant state changes. Our new domain, $\tau$-Banking, models realistic fintech customer support workflows in which agents must navigate roughly 700 interconnected knowledge documents while executing tool-mediated account updates. Across embedding-based retrieval and terminal-based search, even frontier models with high reasoning budgets achieve only $\sim$25.5% pass^1, with reliabilit...

Originally published on March 05, 2026. Curated by AI News.

Machine Learning

VulcanAMI Might Help

I open-sourced a large AI platform I built solo, working 16 hours a day, at my kitchen table, fueled by an inordinate degree of compulsio...

Reddit - Artificial Intelligence · 1 min · about 11 hours ago

Machine Learning

[P] I tested Meta’s brain-response model on posts. It predicted the Elon one almost perfectly.

I built an experimental UI and visualization layer around Meta’s open brain-response model just to see whether this stuff actually works ...

Reddit - Machine Learning · 1 min · about 12 hours ago

Machine Learning

[R] First open-source implementation of Hebbian fast-weight write-back for the BDH architecture

The BDH (Dragon Hatchling) paper (arXiv:2509.26507) describes a Hebbian synaptic plasticity mechanism where model weights update during i...

Reddit - Machine Learning · 1 min · about 21 hours ago

Machine Learning

[D] Could really use some guidance . I'm a 2nd year Data Science UG Student

I'm currently finishing up my second year of a three year Bachelor of Data Science degree. I've got the basics down quite well, linear re...

Reddit - Machine Learning · 1 min · 2 days ago

[2603.04370] $τ$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

About this article

Related Articles

VulcanAMI Might Help

[P] I tested Meta’s brain-response model on posts. It predicted the Elon one almost perfectly.

[R] First open-source implementation of Hebbian fast-weight write-back for the BDH architecture

[D] Could really use some guidance . I'm a 2nd year Data Science UG Student

No comments

Stay updated with AI News