[2602.13003] MASAR: Motion-Appearance Synergy Refinement for Joint Detection and Trajectory Forecasting
Summary
The paper presents MASAR, a novel framework for joint 3D detection and trajectory forecasting that enhances performance by integrating motion and appearance cues.
Why It Matters
This research addresses limitations in current autonomous driving systems by proposing a fully differentiable model that improves the accuracy of trajectory predictions. By leveraging both motion and appearance data, MASAR enhances the synergy between perception and prediction, which is critical for the advancement of autonomous technologies.
Key Takeaways
- MASAR improves trajectory forecasting by over 20% in key metrics.
- The framework integrates motion and appearance features for better performance.
- Compatible with any transformer-based 3D detector, enhancing versatility.
- Utilizes an object-centric spatio-temporal mechanism for encoding features.
- Demonstrates robust detection performance alongside trajectory improvements.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.13003 (cs) [Submitted on 13 Feb 2026] Title:MASAR: Motion-Appearance Synergy Refinement for Joint Detection and Trajectory Forecasting Authors:Mohammed Amine Bencheikh Lehocine, Julian Schmidt, Frank Moosmann, Dikshant Gupta, Fabian Flohr View a PDF of the paper titled MASAR: Motion-Appearance Synergy Refinement for Joint Detection and Trajectory Forecasting, by Mohammed Amine Bencheikh Lehocine and 4 other authors View PDF HTML (experimental) Abstract:Classical autonomous driving systems connect perception and prediction modules via hand-crafted bounding-box interfaces, limiting information flow and propagating errors to downstream tasks. Recent research aims to develop end-to-end models that jointly address perception and prediction; however, they often fail to fully exploit the synergy between appearance and motion cues, relying mainly on short-term visual features. We follow the idea of "looking backward to look forward", and propose MASAR, a novel fully differentiable framework for joint 3D detection and trajectory forecasting compatible with any transformer-based 3D detector. MASAR employs an object-centric spatio-temporal mechanism that jointly encodes appearance and motion features. By predicting past trajectories and refining them using guidance from appearance cues, MASAR captures long-term temporal dependencies that enhance future trajectory forecasting. Experiments conducted on the nuScenes d...