[2603.07990] MJ1: Multimodal Judgment via Grounded Verification

arXiv - Machine Learning March 25, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.07990: MJ1: Multimodal Judgment via Grounded Verification

Computer Science > Machine Learning arXiv:2603.07990 (cs) [Submitted on 9 Mar 2026 (v1), last revised 24 Mar 2026 (this version, v2)] Title:MJ1: Multimodal Judgment via Grounded Verification Authors:Bhavesh Kumar, Dylan Feng, Leonard Tang View a PDF of the paper titled MJ1: Multimodal Judgment via Grounded Verification, by Bhavesh Kumar and 2 other authors View PDF HTML (experimental) Abstract:Multimodal judges struggle to ground decisions in visual evidence. We present MJ1, a multimodal judge trained with reinforcement learning that enforces visual grounding through a structured grounded verification chain (observations $\rightarrow$ claims $\rightarrow$ verification $\rightarrow$ evaluation $\rightarrow$ scoring) and a counterfactual consistency reward that penalizes position bias. Even without training, our mechanism improves base-model accuracy on MMRB2 by +3.8 points on Image Editing and +1.7 on Multimodal Reasoning. After training, MJ1, with only 3B active parameters, achieves 77.0% accuracy on MMRB2 and surpasses orders-of-magnitude larger models like Gemini-3-Pro. These results show that grounded verification and consistency-based training substantially improve multimodal judgment without increasing model scale. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2603.07990 [cs.LG] (or arXiv:2603.07990v2 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2603.07990 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Leonard Tang [view...

Originally published on March 25, 2026. Curated by AI News.

Machine Learning

I have question for people who got job

how you guys getting job in ml as a fresher ?? I am in college. havent started learning ml but willing to . let me know exactly how to do...

Reddit - ML Jobs · 1 min · about 1 hour ago

Llms

🤖 AI News Digest - March 27, 2026

Today's AI news: 1. My minute-by-minute response to the LiteLLM malware attack The article describes a detailed, minute-by-minute respons...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

I have a problem statement where we are supposed to detect the attention level of student in a classroom, basically output whether he is ...

Reddit - Machine Learning · 1 min · about 2 hours ago

Llms

[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

I'm looking to work with people interested in math, machine learning, or agentic coding, on creating a multi-agent framework to do fronti...

Reddit - Machine Learning · 1 min · about 3 hours ago

[2603.07990] MJ1: Multimodal Judgment via Grounded Verification

About this article

Related Articles

I have question for people who got job

🤖 AI News Digest - March 27, 2026

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

No comments

Stay updated with AI News