[2510.20091] CreativityPrism: A Holistic Evaluation Framework for Large Language Model Creativity
Summary
The paper presents CreativityPrism, a comprehensive framework for evaluating the creativity of large language models (LLMs) across various tasks, addressing the limitations of existing evaluation methods.
Why It Matters
As LLMs become integral in generating creative content, a standardized evaluation framework is crucial for assessing their capabilities. CreativityPrism offers a structured approach that enhances the understanding of LLM performance in diverse creative domains, which is essential for developers and researchers in AI.
Key Takeaways
- CreativityPrism consolidates evaluation tasks into a holistic framework.
- The framework emphasizes quality, novelty, and diversity in LLM outputs.
- Proprietary LLMs outperform open-source models in creative writing and logical reasoning.
- High performance in one creative dimension does not guarantee success in others.
- A multi-dimensional evaluation approach is necessary for meaningful assessments.
Computer Science > Computation and Language arXiv:2510.20091 (cs) [Submitted on 23 Oct 2025 (v1), last revised 17 Feb 2026 (this version, v2)] Title:CreativityPrism: A Holistic Evaluation Framework for Large Language Model Creativity Authors:Zhaoyi Joey Hou, Bowei Alvin Zhang, Yining Lu, Bhiman Kumar Baghel, Anneliese Brei, Ximing Lu, Meng Jiang, Faeze Brahman, Snigdha Chaturvedi, Haw-Shiuan Chang, Daniel Khashabi, Xiang Lorraine Li View a PDF of the paper titled CreativityPrism: A Holistic Evaluation Framework for Large Language Model Creativity, by Zhaoyi Joey Hou and 11 other authors View PDF HTML (experimental) Abstract:Creativity is often seen as a hallmark of human intelligence. While large language models (LLMs) are increasingly perceived as generating creative text, there is still no holistic and scalable framework to evaluate their creativity across diverse scenarios. Existing methods of LLM creativity evaluation either heavily rely on humans, limiting speed and scalability, or are fragmented across different domains and different definitions of creativity. To address this gap, we propose CREATIVITYPRISM, an evaluation analysis framework that consolidates eight tasks from three domains, divergent thinking, creative writing, and logical reasoning, into a taxonomy of creativity that emphasizes three dimensions: quality, novelty, and diversity of LLM generations. The framework is designed to be scalable with reliable automatic evaluation judges that have been validat...