[2604.04220] TimeSeek: Temporal Reliability of Agentic Forecasters

arXiv - AI April 07, 2026 3 min read

About this article

Abstract page for arXiv paper 2604.04220: TimeSeek: Temporal Reliability of Agentic Forecasters

Computer Science > Artificial Intelligence arXiv:2604.04220 (cs) [Submitted on 5 Apr 2026] Title:TimeSeek: Temporal Reliability of Agentic Forecasters Authors:Hamza Mostafa, Om Shastri, Dennis Lee View a PDF of the paper titled TimeSeek: Temporal Reliability of Agentic Forecasters, by Hamza Mostafa and 2 other authors View PDF HTML (experimental) Abstract:We introduce TimeSeek, a benchmark for studying how the reliability of agentic LLM forecasters changes over a prediction market's lifecycle. We evaluate 10 frontier models on 150 CFTC-regulated Kalshi binary markets at five temporal checkpoints, with and without web search, for 15,000 forecasts total. Models are most competitive early in a market's life and on high-uncertainty markets, but much less competitive near resolution and on strong-consensus markets. Web search improves pooled Brier Skill Score (BSS) for every model overall, yet hurts in 12% of model-checkpoint pairs, indicating that retrieval is helpful on average but not uniformly so. Simple two-model ensembles reduce error without surpassing the market overall. These descriptive results motivate time-aware evaluation and selective-deference policies rather than a single market snapshot or a uniform tool-use setting. Comments: Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2604.04220 [cs.AI] (or arXiv:2604.04220v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2604.04220 Focus to learn more arXiv-issued DOI via DataCite (pending registra...

Originally published on April 07, 2026. Curated by AI News.

Llms

Anthropic Claude AI training model targets AI skills gap | ETIH EdTech News

AI in education, edtech AI tools, and AI skills training drive Anthropic’s Claude curriculum. ETIH edtech news covers how AI fluency, wor...

AI Tools & Products · 6 min · about 1 hour ago

Llms

I use ChatGPT every day — I stick to these 3 rules to protect my privacy

I stick to three essential rules whenever I open up a new chat in ChatGPT to always protect my privacy and keep my data secure

AI Tools & Products · 9 min · about 1 hour ago

Llms

Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute

AI Tools & Products · 3 min · about 1 hour ago

Llms

Codex and Claude Code Can Work Together

AI Tools & Products · about 1 hour ago

[2604.04220] TimeSeek: Temporal Reliability of Agentic Forecasters

About this article

Related Articles

Anthropic Claude AI training model targets AI skills gap | ETIH EdTech News

I use ChatGPT every day — I stick to these 3 rules to protect my privacy

Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute

Codex and Claude Code Can Work Together

No comments

Stay updated with AI News