[2502.02088] Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation

[2502.02088] Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation

arXiv - AI 4 min read Article

Summary

The paper presents Dual-IPO, a novel framework for optimizing text-to-video generation by iteratively improving both the reward and video generation models to enhance output quality and user preference alignment.

Why It Matters

As video generation technology advances, ensuring that outputs meet user expectations is crucial. Dual-IPO addresses this by refining the generation process through a dual-iterative approach, potentially transforming how AI-generated videos align with human preferences and improving overall synthesis quality.

Key Takeaways

  • Dual-IPO optimizes video generation through a dual-iterative process.
  • The framework enhances synthesis quality by aligning outputs with user preferences.
  • It utilizes CoT-guided reasoning and voting-based self-consistency for robust reward signals.
  • Experiments show significant improvements in video quality, even with smaller models.
  • The approach eliminates the need for extensive manual preference annotations.

Computer Science > Computer Vision and Pattern Recognition arXiv:2502.02088 (cs) [Submitted on 4 Feb 2025 (v1), last revised 26 Feb 2026 (this version, v5)] Title:Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation Authors:Xiaomeng Yang, Mengping Yang, Jia Gong, Luozheng Qin, Zhiyu Tan, Hao Li View a PDF of the paper titled Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation, by Xiaomeng Yang and 5 other authors View PDF HTML (experimental) Abstract:Recent advances in video generation have enabled thrilling experiences in producing realistic videos driven by scalable diffusion transformers. However, they usually fail to produce satisfactory outputs that are aligned to users' authentic demands and preferences. In this work, we introduce Dual-Iterative Optimization (Dual-IPO), an iterative paradigm that sequentially optimizes both the reward model and the video generation model for improved synthesis quality and human preference alignment. For the reward model, our framework ensures reliable and robust reward signals via CoT-guided reasoning, voting-based self-consistency, and preference certainty estimation. Given this, we optimize video foundation models with guidance of signals from reward model's feedback, thus improving the synthesis quality in subject consistency, motion smoothness and aesthetic quality, etc. The reward model and video generation model complement each other and are progressively improved in the multi-...

Related Articles

Machine Learning

[P] I tested Meta’s brain-response model on posts. It predicted the Elon one almost perfectly.

I built an experimental UI and visualization layer around Meta’s open brain-response model just to see whether this stuff actually works ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[P] I trained an AI to play Resident Evil 4 Remake using Behavioral Cloning + LSTM

I recorded gameplay trajectories in RE4's village — running, shooting, reloading, dodging — and used Behavioral Cloning to train a model ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Why does it seem like open source materials on ML are incomplete? this is not enough...

Many times when I try to deeply understand a topic in machine learning — whether it's a new architecture, a quantization method, a full t...

Reddit - Machine Learning · 1 min ·
Llms

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime