Introducing Waypoint-1: Real-time interactive video diffusion from Overworld
About this article
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Back to Articles Waypoint-1: Real-time Interactive Video Diffusion from Overworld Published January 20, 2026 Update on GitHub Upvote 38 +32 Andrew Lapp lapp0 Follow guest Louis Castricato LouisCastricato Follow guest Scott Fox ScottieFox Follow guest Shahbuland Matiana shahbuland Follow guest David Rossi xAesthetics Follow guest Waypoint-1 Weights on the Hub Waypoint-1-Small Waypoint-1-Medium (Coming Soon!) Try Out The Model Overworld Stream: https://overworld.stream What is Waypoint-1? Waypoint-1 is Overworld’s real-time-interactive video diffusion model, controllable and prompted via text, mouse, and keyboard. You can give the model some frames, run the model, and have it create a world you can step into and interact with. The backbone of the model is a frame-causal rectified flow transformer trained on 10,000 hours of diverse video game footage paired with control inputs and text captions. Waypoint-1 is a latent model, meaning that it is trained on compressed frames. The standard among existing world models has become taking pre-trained video models and fine-tuning them with brief and simplified control inputs. In contrast, Waypoint-1 is trained from the get-go with a focus on interactive experiences. With other models, controls are simple: you can move and rotate the camera once every few frames, with severe latency issues. With Waypoint-1 you are not limited at all as far as controls are concerned. You can move the camera freely with the mouse, and input any key on th...