[2603.19303] Agreement Between Large Language Models, Human Reviewers, and Authors in Evaluating STROBE Checklists for Observational Studies in Rheumatology

[2603.19303] Agreement Between Large Language Models, Human Reviewers, and Authors in Evaluating STROBE Checklists for Observational Studies in Rheumatology

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2603.19303: Agreement Between Large Language Models, Human Reviewers, and Authors in Evaluating STROBE Checklists for Observational Studies in Rheumatology

Computer Science > Digital Libraries arXiv:2603.19303 (cs) [Submitted on 12 Mar 2026] Title:Agreement Between Large Language Models, Human Reviewers, and Authors in Evaluating STROBE Checklists for Observational Studies in Rheumatology Authors:Emre Bilgin, Ebru Ozturk, Meera Shah, Lisa Traboco, Rebecca Everitt, Ai Lyn Tan, Marwan Bukhari, Vincenzo Venerito, Latika Gupta View a PDF of the paper titled Agreement Between Large Language Models, Human Reviewers, and Authors in Evaluating STROBE Checklists for Observational Studies in Rheumatology, by Emre Bilgin and 8 other authors View PDF Abstract:Introduction: Evaluating compliance with the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement can be time-consuming and subjective. This study compares STROBE assessments from large language models (LLMs), a human reviewer panel, and the original manuscript authors in observational rheumatology research. Methods: Guided by the GRRAS and DEAL Pathway B frameworks, 17 rheumatology articles were independently assessed. Evaluations used the 22-item STROBE checklist, completed by the authors, a five-person human panel (ranging from junior to senior professionals), and two LLMs (ChatGPT-5.2, Gemini-3Pro). Items were grouped into Methodological Rigor and Presentation and Context domains. Inter-rater reliability was calculated using Gwet's Agreement Coefficient (AC1). Results: Overall agreement across all reviewers was 85.0% (AC1=0.826). Domain stratif...

Originally published on March 23, 2026. Curated by AI News.

Related Articles

Llms

OpenClaw security checklist: practical safeguards for AI agents

Here is one of the better quality guides on the ensuring safety when deploying OpenClaw: https://chatgptguide.ai/openclaw-security-checkl...

Reddit - Artificial Intelligence · 1 min ·
I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge
Llms

I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge

Gemini in Google Maps is a surprisingly useful way to explore new territory.

The Verge - AI · 11 min ·
Llms

The person who replaces you probably won't be AI. It'll be someone from the next department over who learned to use it - opinion/discussion

I'm a strategy person by background. Two years ago I'd write a recommendation and hand it to a product team. Now.. I describe what I want...

Reddit - Artificial Intelligence · 1 min ·
Block Resets Management With AI As Cash App Adds Installment Transfers
Llms

Block Resets Management With AI As Cash App Adds Installment Transfers

Block (NYSE:XYZ) plans a permanent organizational overhaul that replaces many middle management roles with AI-driven models to create fla...

AI Tools & Products · 5 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime