Llms Machine Learning Computer Vision Generative Ai

[2602.13306] Fine-Tuning a Large Vision-Language Model for Artwork's Scoring and Critique

arXiv - Machine Learning February 17, 2026 4 min read Article

Summary

This paper presents a framework for automating the scoring and critique of artwork using a fine-tuned vision-language model, achieving high accuracy in assessments.

Why It Matters

Automating the assessment of artistic creativity can significantly reduce the labor involved in traditional scoring methods, making it scalable for educational and research purposes. This study bridges the gap between computer vision and art evaluation, potentially transforming how creativity is assessed in various contexts.

Key Takeaways

The proposed model fine-tunes Qwen2-VL-7B for artwork assessment.
It utilizes a dataset of 1000 human-created paintings with expert evaluations.
Achieves a Pearson correlation coefficient of over 0.97, indicating strong predictive accuracy.
Generates qualitative feedback that closely aligns with expert critiques.
Offers a scalable solution for creativity assessment in educational settings.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.13306 (cs) [Submitted on 9 Feb 2026] Title:Fine-Tuning a Large Vision-Language Model for Artwork's Scoring and Critique Authors:Zhehan Zhang, Meihua Qian, Li Luo, Siyu Huang, Chaoyi Zhou, Ripon Saha, Xinxin Song View a PDF of the paper titled Fine-Tuning a Large Vision-Language Model for Artwork's Scoring and Critique, by Zhehan Zhang and 6 other authors View PDF Abstract:Assessing artistic creativity is foundational to creativity research and arts education, yet manual scoring (e.g., Torrance Tests of Creative Thinking) is labor-intensive at scale. Prior machine-learning approaches show promise for visual creativity scoring, but many rely mainly on image features and provide limited or no explanatory feedback. We propose a framework for automated creativity assessment of human paintings by fine-tuning the vision-language model Qwen2-VL-7B with multi-task learning. Our dataset contains 1000 human-created paintings scored on a 1-100 scale and paired with a short human-written description (content or artist explanation). Two expert raters evaluated each work using a five-dimension rubric (originality, color, texture, composition, content) and provided written critiques; we use an 80/20 train-test split. We add a lightweight regression head on the visual encoder output so the model can predict a numerical score and generate rubric-aligned feedback in a single forward pass. By embedding the structured rubric...

Read Original Article

[2602.13306] Fine-Tuning a Large Vision-Language Model for Artwork's Scoring and Critique

Summary

Why It Matters

Key Takeaways

Related Articles

Have Companies Began Adopting Claude Co-Work at an Enterprise Level?

What I learned about multi-agent coordination running 9 specialized Claude agents

[D] The problem with comparing AI memory system benchmarks — different evaluation methods make scores meaningless

Shifting to AI model customization is an architectural imperative | MIT Technology Review

No comments

Stay updated with AI News