[2604.09429] Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories
About this article
Abstract page for arXiv paper 2604.09429: Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories
Computer Science > Computer Vision and Pattern Recognition arXiv:2604.09429 (cs) [Submitted on 10 Apr 2026] Title:Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories Authors:Wonbong Jang, Shikun Liu, Soubhik Sanyal, Juan Camilo Perez, Kam Woh Ng, Sanskar Agrawal, Juan-Manuel Perez-Rua, Yiannis Douratsos, Tao Xiang View a PDF of the paper titled Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories, by Wonbong Jang and 8 other authors View PDF HTML (experimental) Abstract:Recovering camera parameters from images and rendering scenes from novel viewpoints have long been treated as separate tasks in computer vision and graphics. This separation breaks down when image coverage is sparse or poses are ambiguous, since each task needs what the other produces. We propose Rays as Pixels, a Video Diffusion Model (VDM) that learns a joint distribution over videos and camera trajectories. We represent each camera as dense ray pixels (raxels) and denoise them jointly with video frames through Decoupled Self-Cross Attention mechanism. A single trained model handles three tasks: predicting camera trajectories from video, jointly generating video and camera trajectory from input images, and generating video from input images along a target camera trajectory. Because the model can both predict trajectories from a video and generate views conditioned on its own predictions, we evaluate it through a closed-loop self-consistency test, demo...