[2603.07990] MJ1: Multimodal Judgment via Grounded Verification

[2603.07990] MJ1: Multimodal Judgment via Grounded Verification

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2603.07990: MJ1: Multimodal Judgment via Grounded Verification

Computer Science > Machine Learning arXiv:2603.07990 (cs) [Submitted on 9 Mar 2026 (v1), last revised 24 Mar 2026 (this version, v2)] Title:MJ1: Multimodal Judgment via Grounded Verification Authors:Bhavesh Kumar, Dylan Feng, Leonard Tang View a PDF of the paper titled MJ1: Multimodal Judgment via Grounded Verification, by Bhavesh Kumar and 2 other authors View PDF HTML (experimental) Abstract:Multimodal judges struggle to ground decisions in visual evidence. We present MJ1, a multimodal judge trained with reinforcement learning that enforces visual grounding through a structured grounded verification chain (observations $\rightarrow$ claims $\rightarrow$ verification $\rightarrow$ evaluation $\rightarrow$ scoring) and a counterfactual consistency reward that penalizes position bias. Even without training, our mechanism improves base-model accuracy on MMRB2 by +3.8 points on Image Editing and +1.7 on Multimodal Reasoning. After training, MJ1, with only 3B active parameters, achieves 77.0% accuracy on MMRB2 and surpasses orders-of-magnitude larger models like Gemini-3-Pro. These results show that grounded verification and consistency-based training substantially improve multimodal judgment without increasing model scale. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2603.07990 [cs.LG]   (or arXiv:2603.07990v2 [cs.LG] for this version)   https://doi.org/10.48550/arXiv.2603.07990 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Leonard Tang [view...

Originally published on March 25, 2026. Curated by AI News.

Related Articles

Machine Learning

I have question for people who got job

how you guys getting job in ml as a fresher ?? I am in college. havent started learning ml but willing to . let me know exactly how to do...

Reddit - ML Jobs · 1 min ·
Llms

🤖 AI News Digest - March 27, 2026

Today's AI news: 1. My minute-by-minute response to the LiteLLM malware attack The article describes a detailed, minute-by-minute respons...

Reddit - Artificial Intelligence · 1 min ·
Llms

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

I have a problem statement where we are supposed to detect the attention level of student in a classroom, basically output whether he is ...

Reddit - Machine Learning · 1 min ·
Llms

[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

I'm looking to work with people interested in math, machine learning, or agentic coding, on creating a multi-agent framework to do fronti...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime