[2602.18504] A Computer Vision Framework for Multi-Class Detection and Tracking in Soccer Broadcast Footage

[2602.18504] A Computer Vision Framework for Multi-Class Detection and Tracking in Soccer Broadcast Footage

arXiv - AI 3 min read Article

Summary

This paper presents a computer vision framework for detecting and tracking players and the ball in soccer broadcast footage using a single-camera setup, enabling affordable analytics for lower-budget teams.

Why It Matters

The research addresses a significant gap in sports analytics by proposing a cost-effective solution for soccer teams lacking advanced tracking technologies. By utilizing standard broadcast footage, the framework democratizes access to performance data, potentially transforming how amateur and collegiate teams analyze games.

Key Takeaways

  • The framework combines YOLO object detection with ByteTrack for effective player tracking.
  • High precision and recall scores indicate strong performance in detecting players and officials.
  • Ball detection remains challenging, highlighting areas for future improvement.
  • The approach reduces reliance on expensive hardware, making analytics accessible to more teams.
  • AI can extract valuable spatial information from standard broadcast footage.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.18504 (cs) [Submitted on 17 Feb 2026] Title:A Computer Vision Framework for Multi-Class Detection and Tracking in Soccer Broadcast Footage Authors:Daniel Tshiani View a PDF of the paper titled A Computer Vision Framework for Multi-Class Detection and Tracking in Soccer Broadcast Footage, by Daniel Tshiani View PDF Abstract:Clubs with access to expensive multi-camera setups or GPS tracking systems gain a competitive advantage through detailed data, whereas lower-budget teams are often unable to collect similar information. This paper examines whether such data can instead be extracted directly from standard broadcast footage using a single-camera computer vision pipeline. This project develops an end-to-end system that combines a YOLO object detector with the ByteTrack tracking algorithm to identify and track players, referees, goalkeepers, and the ball throughout a match. Experimental results show that the pipeline achieves high performance in detecting and tracking players and officials, with strong precision, recall, and mAP50 scores, while ball detection remains the primary challenge. Despite this limitation, our findings demonstrate that AI can extract meaningful player-level spatial information from a single broadcast camera. By reducing reliance on specialized hardware, the proposed approach enables colleges, academies, and amateur clubs to adopt scalable, data-driven analysis methods previously ac...

Related Articles

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·
[2603.26292] findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding
Llms

[2603.26292] findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding

Abstract page for arXiv paper 2603.26292: findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding

arXiv - AI · 3 min ·
More in Computer Vision: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime