[2602.12758] VineetVC: Adaptive Video Conferencing Under Severe Bandwidth Constraints Using Audio-Driven Talking-Head Reconstruction
Summary
The paper presents VineetVC, an adaptive video conferencing system designed to function effectively under severe bandwidth constraints by utilizing audio-driven talking-head reconstruction.
Why It Matters
As remote communication becomes increasingly vital, especially in low-bandwidth environments, this research addresses the challenges of maintaining video quality and stability in real-time conferencing. The proposed solution could enhance user experience in various applications, including telemedicine, remote work, and online education.
Key Takeaways
- VineetVC integrates WebRTC with audio-driven video reconstruction.
- The system operates effectively with a median bandwidth of 32.80 kbps.
- It includes a bandwidth-mode switching strategy for optimal performance.
- Real-time statistics extraction enhances system adaptability.
- The approach can significantly improve video conferencing stability in constrained networks.
Electrical Engineering and Systems Science > Image and Video Processing arXiv:2602.12758 (eess) [Submitted on 13 Feb 2026] Title:VineetVC: Adaptive Video Conferencing Under Severe Bandwidth Constraints Using Audio-Driven Talking-Head Reconstruction Authors:Vineet Kumar Rakesh, Soumya Mazumdar, Tapas Samanta, Hemendra Kumar Pandey, Amitabha Das, Sarbajit Pal View a PDF of the paper titled VineetVC: Adaptive Video Conferencing Under Severe Bandwidth Constraints Using Audio-Driven Talking-Head Reconstruction, by Vineet Kumar Rakesh and Soumya Mazumdar and Tapas Samanta and Hemendra Kumar Pandey and Amitabha Das and Sarbajit Pal View PDF Abstract:Intense bandwidth depletion within consumer and constrained networks has the potential to undermine the stability of real-time video conferencing: encoder rate management becomes saturated, packet loss escalates, frame rates deteriorate, and end-to-end latency significantly increases. This work delineates an adaptive conferencing system that integrates WebRTC media delivery with a supplementary audio-driven talking-head reconstruction pathway and telemetry-driven mode regulation. The system consists of a WebSocket signaling service, an optional SFU for multi-party transmission, a browser client capable of real-time WebRTC statistics extraction and CSV telemetry export, and an AI REST service that processes a reference face image and recorded audio to produce a synthesized MP4; the browser can substitute its outbound camera track with ...