[2303.09807] TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction
Summary
The paper presents TKN, a transformer-based neural network designed for real-time video prediction, achieving a remarkable prediction rate of 1,176 fps while reducing computational costs.
Why It Matters
TKN addresses the limitations of traditional video prediction methods, which often sacrifice speed for accuracy. By optimizing processing through a transformer architecture, this research has significant implications for applications requiring real-time predictions, such as surveillance and autonomous systems.
Key Takeaways
- TKN achieves a video prediction rate of 1,176 fps, enhancing real-time applications.
- The model reduces redundant feature computation through unsupervised dynamic content extraction.
- Utilizes an acceleration matrix and parallel computing to lower computational costs.
- Demonstrates superior performance in qualitative and quantitative experiments across multiple datasets.
- Offers potential for broader applications in areas like danger prediction and robotics.
Computer Science > Computer Vision and Pattern Recognition arXiv:2303.09807 (cs) [Submitted on 17 Mar 2023 (v1), last revised 14 Feb 2026 (this version, v3)] Title:TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction Authors:Haoran Li, XiaoLu Li, Yihang Lin, Yanbin Hao, Haiyong Xie, Pengyuan Zhou, Yong Liao View a PDF of the paper titled TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction, by Haoran Li and 6 other authors View PDF HTML (experimental) Abstract:Video prediction is a complex time-series forecasting task with great potential in many use cases. However, traditional methods prioritize accuracy and overlook slow prediction speeds due to complex model structures, redundant information, and excessive GPU memory consumption. These methods often predict frames sequentially, making acceleration difficult and limiting their applicability in real-time scenarios like danger prediction and this http URL, we propose a transformer-based keypoint prediction neural network (TKN). TKN extracts dynamic content from video frames in an unsupervised manner, reducing redundant feature computation. And, TKN uses an acceleration matrix to reduce the computational cost of attention and employs a parallel computing structure for prediction acceleration. To the best of our knowledge, TKN is the first real-time video prediction solution that achieves a prediction rate of 1,176 fps, significantly reducing computation costs while m...