[2602.13306] Fine-Tuning a Large Vision-Language Model for Artwork's Scoring and Critique

[2602.13306] Fine-Tuning a Large Vision-Language Model for Artwork's Scoring and Critique

arXiv - Machine Learning 4 min read Article

Summary

This paper presents a framework for automating the scoring and critique of artwork using a fine-tuned vision-language model, achieving high accuracy in assessments.

Why It Matters

Automating the assessment of artistic creativity can significantly reduce the labor involved in traditional scoring methods, making it scalable for educational and research purposes. This study bridges the gap between computer vision and art evaluation, potentially transforming how creativity is assessed in various contexts.

Key Takeaways

  • The proposed model fine-tunes Qwen2-VL-7B for artwork assessment.
  • It utilizes a dataset of 1000 human-created paintings with expert evaluations.
  • Achieves a Pearson correlation coefficient of over 0.97, indicating strong predictive accuracy.
  • Generates qualitative feedback that closely aligns with expert critiques.
  • Offers a scalable solution for creativity assessment in educational settings.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.13306 (cs) [Submitted on 9 Feb 2026] Title:Fine-Tuning a Large Vision-Language Model for Artwork's Scoring and Critique Authors:Zhehan Zhang, Meihua Qian, Li Luo, Siyu Huang, Chaoyi Zhou, Ripon Saha, Xinxin Song View a PDF of the paper titled Fine-Tuning a Large Vision-Language Model for Artwork's Scoring and Critique, by Zhehan Zhang and 6 other authors View PDF Abstract:Assessing artistic creativity is foundational to creativity research and arts education, yet manual scoring (e.g., Torrance Tests of Creative Thinking) is labor-intensive at scale. Prior machine-learning approaches show promise for visual creativity scoring, but many rely mainly on image features and provide limited or no explanatory feedback. We propose a framework for automated creativity assessment of human paintings by fine-tuning the vision-language model Qwen2-VL-7B with multi-task learning. Our dataset contains 1000 human-created paintings scored on a 1-100 scale and paired with a short human-written description (content or artist explanation). Two expert raters evaluated each work using a five-dimension rubric (originality, color, texture, composition, content) and provided written critiques; we use an 80/20 train-test split. We add a lightweight regression head on the visual encoder output so the model can predict a numerical score and generate rubric-aligned feedback in a single forward pass. By embedding the structured rubric...

Related Articles

Llms

Have Companies Began Adopting Claude Co-Work at an Enterprise Level?

Hi Guys, My company is considering purchasing the Claude Enterprise plan. The main two constraints are: - Being able to block usage of Cl...

Reddit - Artificial Intelligence · 1 min ·
Llms

What I learned about multi-agent coordination running 9 specialized Claude agents

I've been experimenting with multi-agent AI systems and ended up building something more ambitious than I originally planned: a fully ope...

Reddit - Artificial Intelligence · 1 min ·
Llms

[D] The problem with comparing AI memory system benchmarks — different evaluation methods make scores meaningless

I've been reviewing how various AI memory systems evaluate their performance and noticed a fundamental issue with cross-system comparison...

Reddit - Machine Learning · 1 min ·
Shifting to AI model customization is an architectural imperative | MIT Technology Review
Llms

Shifting to AI model customization is an architectural imperative | MIT Technology Review

In the early days of large language models (LLMs), we grew accustomed to massive 10x jumps in reasoning and coding capability with every ...

MIT Technology Review · 6 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime