[2602.00381] Modeling Image-Caption Rating from Comparative Judgments

[2602.00381] Modeling Image-Caption Rating from Comparative Judgments

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2602.00381: Modeling Image-Caption Rating from Comparative Judgments

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.00381 (cs) [Submitted on 30 Jan 2026 (v1), last revised 24 Mar 2026 (this version, v2)] Title:Modeling Image-Caption Rating from Comparative Judgments Authors:Kezia Minni, Qiang Zhang, Monoshiz Mahbub Khan, Zhe Yu View a PDF of the paper titled Modeling Image-Caption Rating from Comparative Judgments, by Kezia Minni and 3 other authors View PDF Abstract:Image caption rating is becoming increasingly important because computer-generated captions are used extensively for descriptive annotation. However, rating the accuracy of captions in describing images is time-consuming and subjective in nature. In contrast, it is often easier for people to compare (between two pairs) which image-caption pair better matches each other. In this study, we propose a machine learning framework that models such comparative judgments instead of direct ratings. The model can then be applied to rank unseen image-caption pairs in the same way as a regression model trained on direct ratings. Inspired by a state-of-the-art regression approach, we extracted visual and text features using a pre-trained ViLBERT model and tweaked the learning parameters of the baseline model to improve the model performance. This new regression model (with Kendall's $\tau_c=0.812$) outperformed the baseline model (with Kendall's $\tau_c=0.758$) on the VICR dataset. The same model structure was applied to the comparative learning framework. Trained on c...

Originally published on March 26, 2026. Curated by AI News.

Related Articles

[2603.18940] Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought
Llms

[2603.18940] Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought

Abstract page for arXiv paper 2603.18940: Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty ...

arXiv - Machine Learning · 3 min ·
[2512.20620] Uncovering Patterns of Brain Activity from EEG Data Consistently Associated with Cybersickness Using Neural Network Interpretability Maps
Machine Learning

[2512.20620] Uncovering Patterns of Brain Activity from EEG Data Consistently Associated with Cybersickness Using Neural Network Interpretability Maps

Abstract page for arXiv paper 2512.20620: Uncovering Patterns of Brain Activity from EEG Data Consistently Associated with Cybersickness ...

arXiv - Machine Learning · 4 min ·
[2512.13607] Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Machine Learning

[2512.13607] Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Abstract page for arXiv paper 2512.13607: Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

arXiv - Machine Learning · 4 min ·
[2512.02650] Hear What Matters! Text-conditioned Selective Video-to-Audio Generation
Machine Learning

[2512.02650] Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

Abstract page for arXiv paper 2512.02650: Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

arXiv - Machine Learning · 3 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime