[2604.03374] CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge
About this article
Abstract page for arXiv paper 2604.03374: CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge
Computer Science > Computation and Language arXiv:2604.03374 (cs) [Submitted on 3 Apr 2026] Title:CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge Authors:Mete Ismayilzada, Renqing Cuomao, Daniil Yurshevich, Anna Sotnikova, Lonneke van der Plas, Antoine Bosselut View a PDF of the paper titled CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge, by Mete Ismayilzada and 5 other authors View PDF Abstract:Creative problem-solving requires combining multiple cognitive abilities, including logical reasoning, lateral thinking, analogy-making, and commonsense knowledge, to discover insights that connect seemingly unrelated pieces of information. However, most existing benchmarks for large language models (LLMs) evaluate only specific components of this process. Moreover, many creativity-oriented benchmarks rely on artificially constructed brainteasers or contrived scenarios that do not reflect how creative problem-solving occurs in real-world settings. To address this gap, we introduce CresOWLve, a benchmark for evaluating creative problem-solving using puzzles grounded in real-world knowledge. Problems in CresOWLve require employing multiple creative thinking strategies, retrieving facts from diverse domains, and creatively combining them to arrive at a solution. Evaluating several frontier non-thinking and thinking LLMs, we show that CresOWLve remains highly challenging. Our analysis reveals a consistent performance gap: models pe...