[2602.22710] Same Words, Different Judgments: Modality Effects on Preference Alignment

[2602.22710] Same Words, Different Judgments: Modality Effects on Preference Alignment

arXiv - AI 3 min read Article

Summary

This study explores how modality affects preference alignment in AI systems, comparing human and synthetic evaluations of audio and text content. It finds that audio ratings are reliable but exhibit different judgment patterns compared to text.

Why It Matters

Understanding modality effects on preference alignment is crucial for developing AI systems that accurately reflect human preferences. This research highlights the reliability of audio evaluations and their implications for improving AI-human interaction, particularly in reinforcement learning frameworks.

Key Takeaways

  • Audio preferences show high reliability, comparable to text evaluations.
  • Modality influences judgment patterns, with audio raters having narrower decision thresholds.
  • Synthetic ratings can effectively predict human judgments and inter-rater agreement.
  • This study provides the first ICC-based reliability characterization in preference annotation literature.
  • Understanding these differences can enhance AI systems' alignment with human preferences.

Computer Science > Sound arXiv:2602.22710 (cs) [Submitted on 26 Feb 2026] Title:Same Words, Different Judgments: Modality Effects on Preference Alignment Authors:Aaron Broukhim, Nadir Weibel, Eshin Jolly View a PDF of the paper titled Same Words, Different Judgments: Modality Effects on Preference Alignment, by Aaron Broukhim and 2 other authors View PDF HTML (experimental) Abstract:Preference-based reinforcement learning (PbRL) is the dominant framework for aligning AI systems to human preferences, but its application to speech remains underexplored. We present a controlled cross-modal study of human and synthetic preference annotations, comparing text and audio evaluations of identical semantic content across 100 prompts. Audio preferences prove as reliable as text, with inter-rater agreement reaching good levels (ICC(2,k) $\approx$ .80) at $\sim$9 raters -- the first ICC-based reliability characterization in the preference annotation literature for either modality. However, modality reshapes how people judge: audio raters exhibit narrower decision thresholds, reduced length bias, and more user-oriented evaluation criteria, with near-chance cross-modality agreement. Synthetic ratings further align with human judgments and predict inter-rater agreement, supporting their use both for triaging ambiguous pairs and as full replacements for human annotations. Comments: Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC) Cite as: arXiv:2...

Related Articles

Washington needs AI guardrails — now | Opinion
Ai Safety

Washington needs AI guardrails — now | Opinion

We need legislation that draws clear lines on what AI systems may and may not do on behalf of the United States government

AI Tools & Products · 3 min ·
[2601.12910] SciCoQA: Quality Assurance for Scientific Paper--Code Alignment
Ai Safety

[2601.12910] SciCoQA: Quality Assurance for Scientific Paper--Code Alignment

Abstract page for arXiv paper 2601.12910: SciCoQA: Quality Assurance for Scientific Paper--Code Alignment

arXiv - AI · 3 min ·
[2509.21385] Debugging Concept Bottleneck Models through Removal and Retraining
Machine Learning

[2509.21385] Debugging Concept Bottleneck Models through Removal and Retraining

Abstract page for arXiv paper 2509.21385: Debugging Concept Bottleneck Models through Removal and Retraining

arXiv - Machine Learning · 4 min ·
[2512.00804] Epistemic Bias Injection: Biasing LLMs via Selective Context Retrieval
Llms

[2512.00804] Epistemic Bias Injection: Biasing LLMs via Selective Context Retrieval

Abstract page for arXiv paper 2512.00804: Epistemic Bias Injection: Biasing LLMs via Selective Context Retrieval

arXiv - AI · 4 min ·
More in Ai Safety: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime