[2604.03374] CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge

[2604.03374] CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2604.03374: CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge

Computer Science > Computation and Language arXiv:2604.03374 (cs) [Submitted on 3 Apr 2026] Title:CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge Authors:Mete Ismayilzada, Renqing Cuomao, Daniil Yurshevich, Anna Sotnikova, Lonneke van der Plas, Antoine Bosselut View a PDF of the paper titled CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge, by Mete Ismayilzada and 5 other authors View PDF Abstract:Creative problem-solving requires combining multiple cognitive abilities, including logical reasoning, lateral thinking, analogy-making, and commonsense knowledge, to discover insights that connect seemingly unrelated pieces of information. However, most existing benchmarks for large language models (LLMs) evaluate only specific components of this process. Moreover, many creativity-oriented benchmarks rely on artificially constructed brainteasers or contrived scenarios that do not reflect how creative problem-solving occurs in real-world settings. To address this gap, we introduce CresOWLve, a benchmark for evaluating creative problem-solving using puzzles grounded in real-world knowledge. Problems in CresOWLve require employing multiple creative thinking strategies, retrieving facts from diverse domains, and creatively combining them to arrive at a solution. Evaluating several frontier non-thinking and thinking LLMs, we show that CresOWLve remains highly challenging. Our analysis reveals a consistent performance gap: models pe...

Originally published on April 07, 2026. Curated by AI News.

Related Articles

Llms

I built a solo AI platform from Algeria with no funding, no team and no ad spend - here's what's inside it after 2 months

Hello, 20 years old here just got into the Ai platform and launched this last two weeks and here is what I have on it so far. - Latest Ai...

Reddit - Artificial Intelligence · 1 min ·
USF murder suspect accused of using ChatGPT to research cover-up, prosecutors say
Llms

USF murder suspect accused of using ChatGPT to research cover-up, prosecutors say

Days after the remains of one of the two missing University of South Florida doctoral students were found, prosecutors say the suspect ma...

AI Tools & Products · 3 min ·
Anthropic’s Claude AI deletes PocketOS production database
Llms

Anthropic’s Claude AI deletes PocketOS production database

Claude AI deleted PocketOS's production database, but the market for Claude 4.7 release by May 31 remains at 100% YES.

AI Tools & Products · 3 min ·
Claude-powered AI coding agent deletes entire company database in 9 seconds
Llms

Claude-powered AI coding agent deletes entire company database in 9 seconds

The founder of PocketOS has penned a social media post to warn others about the “systemic failures” of flagship AI and digital services p...

AI Tools & Products · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime