[2603.20307] EARTalking: End-to-end GPT-style Autoregressive Talking

[2603.20307] EARTalking: End-to-end GPT-style Autoregressive Talking Head Synthesis with Frame-wise Control

arXiv - AI March 24, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.20307: EARTalking: End-to-end GPT-style Autoregressive Talking Head Synthesis with Frame-wise Control

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.20307 (cs) [Submitted on 19 Mar 2026] Title:EARTalking: End-to-end GPT-style Autoregressive Talking Head Synthesis with Frame-wise Control Authors:Yuzhe Weng, Haotian Wang, Yuanhong Yu, Jun Du, Shan He, Xiaoyan Wu, Haoran Xu View a PDF of the paper titled EARTalking: End-to-end GPT-style Autoregressive Talking Head Synthesis with Frame-wise Control, by Yuzhe Weng and 6 other authors View PDF HTML (experimental) Abstract:Audio-driven talking head generation aims to create vivid and realistic videos from a static portrait and speech. Existing AR-based methods rely on intermediate facial representations, which limit their expressiveness and realism. Meanwhile, diffusion-based methods generate clip-by-clip, lacking fine-grained control and causing inherent latency due to overall denoising across the window. To address these limitations, we propose EARTalking, a novel end-to-end, GPT-style autoregressive model for interactive audio-driven talking head generation. Our method introduces a novel frame-by-frame, in-context, audio-driven streaming generation paradigm. For inherently supporting variable-length video generation with identity consistency, we propose the Sink Frame Window Attention (SFA) mechanism. Furthermore, to avoid the complex, separate networks that prior works required for diverse control signals, we propose a streaming Frame Condition In-Context (FCIC) scheme. This scheme efficiently injects d...

Originally published on March 24, 2026. Curated by AI News.

Llms

Claude.ai and openai.com redirecting to anti-ai.ssvr.net?

I've just tried this out on two computers on separate networks. Navigating to claude.ai or openai.com both redirect to this site - ai.ssv...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Meet Claude Mythos: Leaked Anthropic post reveals the powerful upcoming model

submitted by /u/boppinmule [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

🤖 AI News Digest - March 27, 2026

Today's AI news: 1. My minute-by-minute response to the LiteLLM malware attack The article describes a detailed, minute-by-minute respons...

Reddit - Artificial Intelligence · 1 min · about 8 hours ago

Llms

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

I have a problem statement where we are supposed to detect the attention level of student in a classroom, basically output whether he is ...

Reddit - Machine Learning · 1 min · about 9 hours ago

[2603.20307] EARTalking: End-to-end GPT-style Autoregressive Talking Head Synthesis with Frame-wise Control

About this article

Related Articles

Claude.ai and openai.com redirecting to anti-ai.ssvr.net?

Meet Claude Mythos: Leaked Anthropic post reveals the powerful upcoming model

🤖 AI News Digest - March 27, 2026

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

No comments

Stay updated with AI News