[2602.12641] Artic: AI-oriented Real-time Communication for MLLM Video Assistant
Summary
The paper presents Artic, an AI-oriented real-time communication framework designed for Multimodal Large Language Model (MLLM) video assistants, addressing latency and accuracy issues in current systems.
Why It Matters
As AI video assistants become more prevalent, optimizing real-time communication is crucial for enhancing user experience. Artic's innovative approaches promise to improve interaction quality and efficiency, making it relevant for developers and researchers in AI and networking.
Key Takeaways
- Artic improves real-time communication for MLLM video assistants.
- Introduces a Response Capability-aware Adaptive Bitrate to manage bandwidth effectively.
- Features Zero-overhead Context-aware Streaming to prioritize important video regions.
- Establishes a Degraded Video Understanding Benchmark for evaluating MLLM accuracy.
- Prototype tests show significant improvements in accuracy and latency.
Computer Science > Networking and Internet Architecture arXiv:2602.12641 (cs) [Submitted on 13 Feb 2026] Title:Artic: AI-oriented Real-time Communication for MLLM Video Assistant Authors:Jiangkai Wu, Zhiyuan Ren, Junquan Zhong, Liming Liu, Xinggong Zhang View a PDF of the paper titled Artic: AI-oriented Real-time Communication for MLLM Video Assistant, by Jiangkai Wu and 4 other authors View PDF HTML (experimental) Abstract:AI Video Assistant emerges as a new paradigm for Real-time Communication (RTC), where one peer is a Multimodal Large Language Model (MLLM) deployed in the cloud. This makes interaction between humans and AI more intuitive, akin to chatting with a real person. However, a fundamental mismatch exists between current RTC frameworks and AI Video Assistants, stemming from the drastic shift in Quality of Experience (QoE) and more challenging networks. Measurements on our production prototype also confirm that current RTC fails, causing latency spikes and accuracy drops. To address these challenges, we propose Artic, an AI-oriented RTC framework for MLLM Video Assistants, exploring the shift from "humans watching video" to "AI understanding video." Specifically, Artic proposes: (1) Response Capability-aware Adaptive Bitrate, which utilizes MLLM accuracy saturation to proactively cap bitrate, reserving bandwidth headroom to absorb future fluctuations for latency reduction; (2) Zero-overhead Context-aware Streaming, which allocates limited bitrate to regions most...