[2601.02663] When Do Tools and Planning Help Large Language Models

[2601.02663] When Do Tools and Planning Help Large Language Models Think? A Cost- and Latency-Aware Benchmark

arXiv - AI March 06, 2026 4 min read

About this article

Abstract page for arXiv paper 2601.02663: When Do Tools and Planning Help Large Language Models Think? A Cost- and Latency-Aware Benchmark

Computer Science > Computation and Language arXiv:2601.02663 (cs) [Submitted on 6 Jan 2026 (v1), last revised 5 Mar 2026 (this version, v2)] Title:When Do Tools and Planning Help Large Language Models Think? A Cost- and Latency-Aware Benchmark Authors:Subha Ghoshal, Ali Al-Bustami View a PDF of the paper titled When Do Tools and Planning Help Large Language Models Think? A Cost- and Latency-Aware Benchmark, by Subha Ghoshal and Ali Al-Bustami View PDF HTML (experimental) Abstract:Modern large language models (LLMs) increasingly rely on inference-time planning and external tools to improve reasoning. We benchmark this behavior on two real-world settings: event-centric question answering over graph-structured knowledge (Event-QA) and persuasive response generation in Reddit ChangeMyView (CMV). Using LangChain and LangGraph, we compare a one-shot baseline against a plan-execute-replan agent equipped with task-specific tools (DBpedia SPARQL/lookup/schema exploration, Wikipedia-focused retrieval, and topical web search). We evaluate on 60 examples each from Event-QA and CMV (3 splits of 20), and report both mean end-to-end latency and per-example token cost estimates. We evaluate GPT-4o and GPT-4o-mini under identical workflows and report accuracy and end-to-end latency. On Event-QA, the best tool-augmented configuration improves accuracy (e.g., 47.5\% $\rightarrow$ 67.5\% for GPT-4o) while increasing latency by orders of magnitude ($\sim$8s $\rightarrow$ $\sim$317s per example...

Originally published on March 06, 2026. Curated by AI News.

Llms

I Accidentally Discovered a Security Vulnerability in AI Education — Then Submitted It To a $200K Competition

Last night I was testing Maestro University, the first fully AI-taught university. I walked into their enrollment chatbot and asked it to...

Reddit - Artificial Intelligence · 1 min · 44 minutes ago

Llms

Is anyone else concerned with this blatant potential of security / privacy breach?

Recently, when sending a very sensitive email to my brother including my mother’s health information, I wondered what happens if a recipi...

Reddit - Artificial Intelligence · 1 min · 44 minutes ago

Llms

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I published a paper today on something I've been calling postural manipulation. The short version: ordi...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I've been documenting what I'm calling postural manipulation: a specific class of language that install...

Reddit - Machine Learning · 1 min · about 3 hours ago

[2601.02663] When Do Tools and Planning Help Large Language Models Think? A Cost- and Latency-Aware Benchmark

About this article

Related Articles

I Accidentally Discovered a Security Vulnerability in AI Education — Then Submitted It To a $200K Competition

Is anyone else concerned with this blatant potential of security / privacy breach?

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

No comments

Stay updated with AI News