[2602.18443] From "Help" to Helpful: A Hierarchical Assessment of LLMs in Mental e-Health Applications
Summary
This study evaluates the effectiveness of large language models (LLMs) in generating subject lines for mental health counseling emails, highlighting performance trade-offs and ethical considerations.
Why It Matters
As mental health services increasingly incorporate AI, understanding the effectiveness and ethical implications of LLMs is crucial. This research provides insights into how these models can improve communication in counseling, addressing privacy, bias, and accountability in AI applications.
Key Takeaways
- Evaluates 11 LLMs for generating subject lines in counseling emails.
- Findings indicate performance trade-offs between proprietary and open-source models.
- German fine-tuning of models consistently enhances performance.
- Highlights critical ethical issues in AI deployment for mental health.
- Utilizes statistical methods for robust analysis of model outputs.
Computer Science > Human-Computer Interaction arXiv:2602.18443 (cs) [Submitted on 12 Jan 2026] Title:From "Help" to Helpful: A Hierarchical Assessment of LLMs in Mental e-Health Applications Authors:Philipp Steigerwald, Jens Albrecht View a PDF of the paper titled From "Help" to Helpful: A Hierarchical Assessment of LLMs in Mental e-Health Applications, by Philipp Steigerwald and 1 other authors View PDF HTML (experimental) Abstract:Psychosocial online counselling frequently encounters generic subject lines that impede efficient case prioritisation. This study evaluates eleven large language models generating six-word subject lines for German counselling emails through hierarchical assessment - first categorising outputs, then ranking within categories to enable manageable evaluation. Nine assessors (counselling professionals and AI systems) enable analysis via Krippendorff's $\alpha$, Spearman's $\rho$, Pearson's $r$ and Kendall's $\tau$. Results reveal performance trade-offs between proprietary services and privacy-preserving open-source alternatives, with German fine-tuning consistently improving performance. The study addresses critical ethical considerations for mental health AI deployment including privacy, bias and accountability. Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY) Cite as: arXiv:2602.18443 [cs.HC] (or arXiv:2602.18443v1 [cs.HC] for this version) https://d...