[2603.28387] The Scaffold Effect: How Prompt Framing Drives Apparent

[2603.28387] The Scaffold Effect: How Prompt Framing Drives Apparent Multimodal Gains in Clinical VLM Evaluation

arXiv - Machine Learning March 31, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.28387: The Scaffold Effect: How Prompt Framing Drives Apparent Multimodal Gains in Clinical VLM Evaluation

Computer Science > Artificial Intelligence arXiv:2603.28387 (cs) [Submitted on 30 Mar 2026] Title:The Scaffold Effect: How Prompt Framing Drives Apparent Multimodal Gains in Clinical VLM Evaluation Authors:Doan Nam Long Vu, Simone Balloccu View a PDF of the paper titled The Scaffold Effect: How Prompt Framing Drives Apparent Multimodal Gains in Clinical VLM Evaluation, by Doan Nam Long Vu and 1 other authors View PDF HTML (experimental) Abstract:Trustworthy clinical AI requires that performance gains reflect genuine evidence integration rather than surface-level artifacts. We evaluate 12 open-weight vision-language models (VLMs) on binary classification across two clinical neuroimaging cohorts, \textsc{FOR2107} (affective disorders) and \textsc{OASIS-3} (cognitive decline). Both datasets come with structural MRI data that carries no reliable individual-level diagnostic signal. Under these conditions, smaller VLMs exhibit gains of up to 58\% F1 upon introduction of neuroimaging context, with distilled models becoming competitive with counterparts an order of magnitude larger. A contrastive confidence analysis reveals that merely \emph{mentioning} MRI availability in the task prompt accounts for 70-80\% of this shift, independent of whether imaging data is present, a domain-specific instance of modality collapse we term the \emph{scaffold effect}. Expert evaluation reveals fabrication of neuroimaging-grounded justifications across all conditions, and preference alignment, wh...

Originally published on March 31, 2026. Curated by AI News.

Llms

[2604.01473] SelfGrader: Stable Jailbreak Detection for Large Language Models using Token-Level Logits

Abstract page for arXiv paper 2604.01473: SelfGrader: Stable Jailbreak Detection for Large Language Models using Token-Level Logits

arXiv - AI · 4 min · about 1 hour ago

Llms

[2603.23682] Assessment Design in the AI Era: A Method for Identifying Items Functioning Differentially for Humans and Chatbots

Abstract page for arXiv paper 2603.23682: Assessment Design in the AI Era: A Method for Identifying Items Functioning Differentially for ...

arXiv - AI · 4 min · about 1 hour ago

Llms

[2601.07422] Two Pathways to Truthfulness: On the Intrinsic Encoding of LLM Hallucinations

Abstract page for arXiv paper 2601.07422: Two Pathways to Truthfulness: On the Intrinsic Encoding of LLM Hallucinations

arXiv - AI · 4 min · about 1 hour ago

Llms

[2603.08486] Visual Self-Fulfilling Alignment: Shaping Safety-Oriented Personas via Threat-Related Images

Abstract page for arXiv paper 2603.08486: Visual Self-Fulfilling Alignment: Shaping Safety-Oriented Personas via Threat-Related Images

arXiv - AI · 3 min · about 1 hour ago

[2603.28387] The Scaffold Effect: How Prompt Framing Drives Apparent Multimodal Gains in Clinical VLM Evaluation

About this article

Related Articles

[2604.01473] SelfGrader: Stable Jailbreak Detection for Large Language Models using Token-Level Logits

[2603.23682] Assessment Design in the AI Era: A Method for Identifying Items Functioning Differentially for Humans and Chatbots

[2601.07422] Two Pathways to Truthfulness: On the Intrinsic Encoding of LLM Hallucinations

[2603.08486] Visual Self-Fulfilling Alignment: Shaping Safety-Oriented Personas via Threat-Related Images

No comments

Stay updated with AI News