[2604.08986] PerMix-RLVR: Preserving Persona Expressivity under

[2604.08986] PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment

arXiv - AI April 13, 2026 3 min read

About this article

Abstract page for arXiv paper 2604.08986: PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment

Computer Science > Computation and Language arXiv:2604.08986 (cs) [Submitted on 10 Apr 2026] Title:PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment Authors:Jihwan Oh, Soowon Oh, Murad Aghazada, Minchan Jeong, Sungnyun Kim, Se-Young Yun View a PDF of the paper titled PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment, by Jihwan Oh and 5 other authors View PDF HTML (experimental) Abstract:Persona prompting has been widely adopted to steer large language models (LLMs) behavior and improve their instruction performance by assigning specific characters. However, identifying an optimal persona is time-consuming, and its impact on output quality remains poorly understood. Prior work has mainly addressed this issue at the prompt level via inference-time strategies, incurring additional computation. In this work, we avoid inference-time prompt search by tackling persona sensitivity during training, aiming to train models that adapt their behavior to diverse personas while preserving task performance. In particular, we find that reinforcement learning with verifiable rewards (RLVR) systematically reduces sensitivity to persona prompts, but also reveals an inherent trade-off of outcome-based optimization: while RLVR improves robustness on tasks with verifiable goals, it can also degrade persona expressivity when needed, e.g., in-character role-playing. To address this limitation, we propose PerMix-RLVR, a persona-mixed RLVR st...

Originally published on April 13, 2026. Curated by AI News.

Llms

I am not an "anti" like this guy, but still an interesting video of person interacting with chat 4o

(Posting Here because removed by Chatgpt Complaints moderators because the model here is 4o, and refuse to believe there were any safety ...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

We built a way for two people's AI context to talk to each other (without sharing their conversations)

We've been thinking about how we use AI in our relationships. Big part of it is about other people. Talking about them, figuring out what...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

No flattery please, Claude: I’m British | Brief letters

AI Tools & Products · 2 min · about 4 hours ago

Llms

Unsolved AI Mystery Is Solved Along With Lessons Learned On Why ChatGPT Became Oddly Obsessed With Gremlins And Goblins

This article discusses the resolution of an AI mystery regarding ChatGPT's unusual focus on gremlins and goblins, along with insights gai...

AI Tools & Products · 1 min · about 4 hours ago

[2604.08986] PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment

About this article

Related Articles

I am not an "anti" like this guy, but still an interesting video of person interacting with chat 4o

We built a way for two people's AI context to talk to each other (without sharing their conversations)

No flattery please, Claude: I’m British | Brief letters

Unsolved AI Mystery Is Solved Along With Lessons Learned On Why ChatGPT Became Oddly Obsessed With Gremlins And Goblins

No comments

Stay updated with AI News