[2602.00381] Modeling Image-Caption Rating from Comparative Judgments

arXiv - Machine Learning March 26, 2026 4 min read

About this article

Abstract page for arXiv paper 2602.00381: Modeling Image-Caption Rating from Comparative Judgments

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.00381 (cs) [Submitted on 30 Jan 2026 (v1), last revised 24 Mar 2026 (this version, v2)] Title:Modeling Image-Caption Rating from Comparative Judgments Authors:Kezia Minni, Qiang Zhang, Monoshiz Mahbub Khan, Zhe Yu View a PDF of the paper titled Modeling Image-Caption Rating from Comparative Judgments, by Kezia Minni and 3 other authors View PDF Abstract:Image caption rating is becoming increasingly important because computer-generated captions are used extensively for descriptive annotation. However, rating the accuracy of captions in describing images is time-consuming and subjective in nature. In contrast, it is often easier for people to compare (between two pairs) which image-caption pair better matches each other. In this study, we propose a machine learning framework that models such comparative judgments instead of direct ratings. The model can then be applied to rank unseen image-caption pairs in the same way as a regression model trained on direct ratings. Inspired by a state-of-the-art regression approach, we extracted visual and text features using a pre-trained ViLBERT model and tweaked the learning parameters of the baseline model to improve the model performance. This new regression model (with Kendall's $\tau_c=0.812$) outperformed the baseline model (with Kendall's $\tau_c=0.758$) on the VICR dataset. The same model structure was applied to the comparative learning framework. Trained on c...

Originally published on March 26, 2026. Curated by AI News.

Llms

[2603.18940] Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought

Abstract page for arXiv paper 2603.18940: Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty ...

arXiv - Machine Learning · 3 min · 33 minutes ago

Machine Learning

[2512.20620] Uncovering Patterns of Brain Activity from EEG Data Consistently Associated with Cybersickness Using Neural Network Interpretability Maps

Abstract page for arXiv paper 2512.20620: Uncovering Patterns of Brain Activity from EEG Data Consistently Associated with Cybersickness ...

arXiv - Machine Learning · 4 min · 33 minutes ago

Machine Learning

[2512.13607] Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Abstract page for arXiv paper 2512.13607: Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

arXiv - Machine Learning · 4 min · 33 minutes ago

Machine Learning

[2512.02650] Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

Abstract page for arXiv paper 2512.02650: Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

arXiv - Machine Learning · 3 min · 33 minutes ago

[2602.00381] Modeling Image-Caption Rating from Comparative Judgments

About this article

Related Articles

[2603.18940] Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought

[2512.20620] Uncovering Patterns of Brain Activity from EEG Data Consistently Associated with Cybersickness Using Neural Network Interpretability Maps

[2512.13607] Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

[2512.02650] Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

No comments

Stay updated with AI News