[2509.24207] Humanline: Online Alignment as Perceptual Loss

[2509.24207] Humanline: Online Alignment as Perceptual Loss

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2509.24207: Humanline: Online Alignment as Perceptual Loss

Computer Science > Artificial Intelligence arXiv:2509.24207 (cs) [Submitted on 29 Sep 2025 (v1), last revised 27 Mar 2026 (this version, v2)] Title:Humanline: Online Alignment as Perceptual Loss Authors:Sijia Liu, Niklas Muennighoff, Kawin Ethayarajh View a PDF of the paper titled Humanline: Online Alignment as Perceptual Loss, by Sijia Liu and 2 other authors View PDF HTML (experimental) Abstract:Online alignment (e.g., GRPO) is generally more performant than offline alignment (e.g., DPO) -- but why? Drawing on prospect theory from behavioral economics, we propose a human-centric explanation. We prove that online on-policy sampling better approximates the human-perceived distribution of what the model can produce, and PPO/GRPO-style clipping -- originally introduced to just stabilize training -- recovers a perceptual bias in how humans perceive probability. In this sense, PPO/GRPO act as perceptual losses already. Our theory further suggests that the online/offline dichotomy is itself incidental to maximizing human utility, since we can achieve the same effect by selectively training on any data in a manner that mimics human perception, rather than restricting ourselves to online on-policy data. Doing so would allow us to post-train more quickly, cheaply, and flexibly without sacrificing performance. To this end, we propose a design pattern that explicitly incorporates perceptual distortions of probability into objectives like DPO/KTO/GRPO, creating humanline variants of ...

Originally published on March 30, 2026. Curated by AI News.

Related Articles

20+ Best AI Project Ideas for 2026: Trending AI Projects
Ai Startups

20+ Best AI Project Ideas for 2026: Trending AI Projects

This article presents over 20 AI project ideas tailored for various skill levels, providing a roadmap for building portfolio-ready projec...

AI Events ·
Top 10 AI certifications and courses for 2026
Ai Startups

Top 10 AI certifications and courses for 2026

This article reviews the top 10 AI certifications and courses for 2026, highlighting their significance in a rapidly evolving field and t...

AI Events · 15 min ·
Machine Learning

[P] Looking for people who have had training runs fail unexpectedly to beta test a stability monitor. Free, takes 5 minutes to add to your existing loop. DM me.

Anyone actively training models want to try a stability monitor on a real run? Trying to get real world validation outside my own benchma...

Reddit - Machine Learning · 1 min ·
Llms

Is the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models

Last week, a team from Stanford and UCSF (Asadi, O'Sullivan, Fei-Fei Li, Euan Ashley et al.) dropped two companion papers. The first, MAR...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime