[2603.02232] Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback
About this article
Abstract page for arXiv paper 2603.02232: Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback
Computer Science > Machine Learning arXiv:2603.02232 (cs) [Submitted on 13 Feb 2026] Title:Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback Authors:Amirhossein Afsharrad, Ruida Zhou, Luca Viano, Sanjay Lall, Mohammad Ghavamzadeh View a PDF of the paper titled Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback, by Amirhossein Afsharrad and 4 other authors View PDF HTML (experimental) Abstract:Reward modeling is crucial for aligning large language models with human preferences, yet current approaches lack a principled mathematical framework for leveraging ordinal preference data. When human annotators provide graded preferences on a Likert scale (e.g., significantly better, better, slightly better, negligibly better), existing methods typically apply ad-hoc heuristics, such as margin terms or scaling factors, to loss functions derived from binary preference models like Bradley-Terry. These approaches lack an underlying mathematical model for how ordinal preference data is generated. We present a theoretically grounded framework that formulates reward modeling with Likert scale preferences as a discrete ordinal regression problem. We derive two loss functions from this formulation: a negative log-likelihood loss and an all-threshold loss, both of which learn threshold parameters that naturally capture the ordinal structure of preferences. Unlike existing heuristic methods that manually specify...