[2603.20307] EARTalking: End-to-end GPT-style Autoregressive Talking Head Synthesis with Frame-wise Control

[2603.20307] EARTalking: End-to-end GPT-style Autoregressive Talking Head Synthesis with Frame-wise Control

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2603.20307: EARTalking: End-to-end GPT-style Autoregressive Talking Head Synthesis with Frame-wise Control

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.20307 (cs) [Submitted on 19 Mar 2026] Title:EARTalking: End-to-end GPT-style Autoregressive Talking Head Synthesis with Frame-wise Control Authors:Yuzhe Weng, Haotian Wang, Yuanhong Yu, Jun Du, Shan He, Xiaoyan Wu, Haoran Xu View a PDF of the paper titled EARTalking: End-to-end GPT-style Autoregressive Talking Head Synthesis with Frame-wise Control, by Yuzhe Weng and 6 other authors View PDF HTML (experimental) Abstract:Audio-driven talking head generation aims to create vivid and realistic videos from a static portrait and speech. Existing AR-based methods rely on intermediate facial representations, which limit their expressiveness and realism. Meanwhile, diffusion-based methods generate clip-by-clip, lacking fine-grained control and causing inherent latency due to overall denoising across the window. To address these limitations, we propose EARTalking, a novel end-to-end, GPT-style autoregressive model for interactive audio-driven talking head generation. Our method introduces a novel frame-by-frame, in-context, audio-driven streaming generation paradigm. For inherently supporting variable-length video generation with identity consistency, we propose the Sink Frame Window Attention (SFA) mechanism. Furthermore, to avoid the complex, separate networks that prior works required for diverse control signals, we propose a streaming Frame Condition In-Context (FCIC) scheme. This scheme efficiently injects d...

Originally published on March 24, 2026. Curated by AI News.

Related Articles

Llms

Claude.ai and openai.com redirecting to anti-ai.ssvr.net?

I've just tried this out on two computers on separate networks. Navigating to claude.ai or openai.com both redirect to this site - ai.ssv...

Reddit - Artificial Intelligence · 1 min ·
Llms

Meet Claude Mythos: Leaked Anthropic post reveals the powerful upcoming model

submitted by /u/boppinmule [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Llms

🤖 AI News Digest - March 27, 2026

Today's AI news: 1. My minute-by-minute response to the LiteLLM malware attack The article describes a detailed, minute-by-minute respons...

Reddit - Artificial Intelligence · 1 min ·
Llms

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

I have a problem statement where we are supposed to detect the attention level of student in a classroom, basically output whether he is ...

Reddit - Machine Learning · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime