[2602.14216] Reasoning Language Models for complex assessments tasks: Evaluating parental cooperation from child protection case reports

[2602.14216] Reasoning Language Models for complex assessments tasks: Evaluating parental cooperation from child protection case reports

arXiv - AI 4 min read Article

Summary

This article explores the effectiveness of reasoning language models (RLMs) in assessing parental cooperation during child protection interventions, demonstrating significant advancements in accuracy compared to traditional methods.

Why It Matters

The study highlights the potential of RLMs to enhance decision-making in child protection services, addressing complex assessments that often involve ambiguous information. By improving accuracy in evaluating parental cooperation, the findings could lead to better outcomes in child welfare interventions.

Key Takeaways

  • RLMs achieved an accuracy of 89% in assessing parental cooperation.
  • The largest RLM outperformed traditional methods, which had an accuracy of 80%.
  • Higher classification accuracy was noted for mothers (93%) compared to fathers (85%).
  • The study underscores the need for balanced focus on both parents in CPS interventions.
  • RLMs can effectively handle complex case factors in child protection scenarios.

Computer Science > Computers and Society arXiv:2602.14216 (cs) [Submitted on 15 Feb 2026] Title:Reasoning Language Models for complex assessments tasks: Evaluating parental cooperation from child protection case reports Authors:Dragan Stoll, Brian E. Perron, Zia Qi, Selina Steinmann, Nicole F. Eicher, Andreas Jud View a PDF of the paper titled Reasoning Language Models for complex assessments tasks: Evaluating parental cooperation from child protection case reports, by Dragan Stoll and 5 other authors View PDF Abstract:Purpose: Reasoning language models (RLMs) have demonstrated significant advances in solving complex reasoning tasks. We examined their potential to assess parental cooperation during CPS interventions using case reports, a case factor characterized by ambiguous and conflicting information. Methods: A four stage workflow comprising (1) case reports collection, (2) reasoning-based assessment of parental cooperation, (3) automated category extraction, and (4) case labeling was developed. The performance of RLMs with different parameter sizes (255B, 32B, 4B) was compared against human validated data. Two expert human reviewers (EHRs) independently classified a weighted random sample of reports. Results: The largest RLM achieved the highest accuracy (89%), outperforming the initial approach (80%). Classification accuracy was higher for mothers (93%) than for fathers (85%), and EHRs exhibited similar differences. Conclusions: RLMs' reasoning can effectively assess...

Related Articles

How to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and others | TechCrunch
Llms

How to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and others | TechCrunch

Learn how to use Spotify, Canva, Figma, Expedia, and other apps directly in ChatGPT.

TechCrunch - AI · 10 min ·
Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto
Llms

Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto

AI Tools & Products · 7 min ·
Is cutting ‘please’ when talking to ChatGPT better for the planet? An expert explains
Llms

Is cutting ‘please’ when talking to ChatGPT better for the planet? An expert explains

AI Tools & Products · 5 min ·
AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface
Llms

AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface

AI Tools & Products · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime