[2603.00159] FlowPortrait: Reinforcement Learning for Audio-Driven Portrait Video Generation

[2603.00159] FlowPortrait: Reinforcement Learning for Audio-Driven Portrait Video Generation

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.00159: FlowPortrait: Reinforcement Learning for Audio-Driven Portrait Video Generation

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.00159 (cs) [Submitted on 25 Feb 2026] Title:FlowPortrait: Reinforcement Learning for Audio-Driven Portrait Video Generation Authors:Weiting Tan, Andy T. Liu, Ming Tu, Xinghua Qu, Philipp Koehn, Lu Lu View a PDF of the paper titled FlowPortrait: Reinforcement Learning for Audio-Driven Portrait Video Generation, by Weiting Tan and 5 other authors View PDF HTML (experimental) Abstract:Generating realistic talking-head videos remains challenging due to persistent issues such as imperfect lip synchronization, unnatural motion, and evaluation metrics that correlate poorly with human perception. We propose FlowPortrait, a reinforcement-learning framework for audio-driven portrait animation built on a multimodal backbone for autoregressive audio-to-video generation. FlowPortrait introduces a human-aligned evaluation system based on Multimodal Large Language Models (MLLMs) to assess lip-sync accuracy, expressiveness, and motion quality. These signals are combined with perceptual and temporal consistency regularizers to form a stable composite reward, which is used to post-train the generator via Group Relative Policy Optimization (GRPO). Extensive experiments, including both automatic evaluations and human preference studies, demonstrate that FlowPortrait consistently produces higher-quality talking-head videos, highlighting the effectiveness of reinforcement learning for portrait animation. Subjects: Computer Vi...

Originally published on March 03, 2026. Curated by AI News.

Related Articles

[2602.08277] PISCO: Precise Video Instance Insertion with Sparse Control
Generative Ai

[2602.08277] PISCO: Precise Video Instance Insertion with Sparse Control

Abstract page for arXiv paper 2602.08277: PISCO: Precise Video Instance Insertion with Sparse Control

arXiv - AI · 4 min ·
[2511.18746] Any4D: Open-Prompt 4D Generation from Natural Language and Images
Machine Learning

[2511.18746] Any4D: Open-Prompt 4D Generation from Natural Language and Images

Abstract page for arXiv paper 2511.18746: Any4D: Open-Prompt 4D Generation from Natural Language and Images

arXiv - AI · 4 min ·
[2512.14549] Dual-objective Language Models: Training Efficiency Without Overfitting
Llms

[2512.14549] Dual-objective Language Models: Training Efficiency Without Overfitting

Abstract page for arXiv paper 2512.14549: Dual-objective Language Models: Training Efficiency Without Overfitting

arXiv - AI · 3 min ·
[2510.21011] Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas Across 41 Occupations
Llms

[2510.21011] Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas Across 41 Occupations

Abstract page for arXiv paper 2510.21011: Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas A...

arXiv - AI · 4 min ·
More in Generative Ai: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime