[2601.14172] Human Values in a Single Sentence: Moral Presence, Hierarchies, and Transformer Ensembles on the Schwartz Continuum
Summary
This article explores the detection of 19 human values in sentences using transformer models, demonstrating the learnability of moral presence and comparing various architectures for efficiency.
Why It Matters
Understanding how to detect human values in language is crucial for developing AI systems that align with ethical standards. This research provides insights into building more value-aware NLP models, which is increasingly relevant in today's AI landscape.
Key Takeaways
- Moral presence can be effectively learned from sentences using transformer models.
- The study compares direct multi-label detection with presence-gated architectures, revealing limitations in the latter.
- Lightweight auxiliary signals can enhance model performance in detecting human values.
- The research benchmarks various instruction-tuned LLMs, highlighting their performance relative to supervised models.
- Empirical findings guide the development of compute-efficient, value-aware NLP models.
Computer Science > Computation and Language arXiv:2601.14172 (cs) [Submitted on 20 Jan 2026 (v1), last revised 16 Feb 2026 (this version, v3)] Title:Human Values in a Single Sentence: Moral Presence, Hierarchies, and Transformer Ensembles on the Schwartz Continuum Authors:Víctor Yeste, Paolo Rosso View a PDF of the paper titled Human Values in a Single Sentence: Moral Presence, Hierarchies, and Transformer Ensembles on the Schwartz Continuum, by V\'ictor Yeste and 1 other authors View PDF Abstract:We study sentence-level detection of the 19 human values in the refined Schwartz continuum in about 74k English sentences from news and political manifestos (ValueEval'24 corpus). Each sentence is annotated with value presence, yielding a binary moral-presence label and a 19-way multi-label task under severe class imbalance. First, we show that moral presence is learnable from single sentences: a DeBERTa-base classifier attains positive-class F1 = 0.74 with calibrated thresholds. Second, we compare direct multi-label value detectors with presence-gated hierarchies in a setting where only a single consumer-grade GPU with 8 GB of VRAM is available, and we explicitly choose all training and inference configurations to fit within this budget. Presence gating does not improve over direct prediction, indicating that gate recall becomes a bottleneck. Third, we investigate lightweight auxiliary signals - short-range context, LIWC-22, and moral lexica - and small ensembles. Our best super...