[2603.00173] Summer-22B: A Systematic Approach to Dataset Engineering and Training at Scale for Video Foundation Model
About this article
Abstract page for arXiv paper 2603.00173: Summer-22B: A Systematic Approach to Dataset Engineering and Training at Scale for Video Foundation Model
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.00173 (cs) [Submitted on 26 Feb 2026] Title:Summer-22B: A Systematic Approach to Dataset Engineering and Training at Scale for Video Foundation Model Authors:Simo Ryu, Chunghwan Han View a PDF of the paper titled Summer-22B: A Systematic Approach to Dataset Engineering and Training at Scale for Video Foundation Model, by Simo Ryu and 1 other authors View PDF HTML (experimental) Abstract:We describe our experience training Summer-22B, a video foundation model developed from scratch. This report documents the engineering challenges, design decisions, and lessons learned while scaling from raw footage collection to a functional model trained on approximately 50 million clips. We outline our approach combining metadata-driven dataset curation, multi-stage filtering, $\mu$P parameterization, and hypersphere-constrained optimization. We developed the Lavender Data system for dataset management and adopted inference-aware architectural choices. We share observations on what worked in our setting: dataset engineering consumed the majority of effort, architectural variants showed smaller differences than we expected, and $\mu$P hyperparameter transfer appeared effective even under geometric constraints. We hope this account proves useful to others undertaking similar projects. Comments: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: ar...