[2511.20011] Multi-Context Fusion Transformer for Pedestrian Crossing Intention Prediction in Urban Environments
About this article
Abstract page for arXiv paper 2511.20011: Multi-Context Fusion Transformer for Pedestrian Crossing Intention Prediction in Urban Environments
Computer Science > Computer Vision and Pattern Recognition arXiv:2511.20011 (cs) [Submitted on 25 Nov 2025 (v1), last revised 21 Mar 2026 (this version, v2)] Title:Multi-Context Fusion Transformer for Pedestrian Crossing Intention Prediction in Urban Environments Authors:Yuanzhe Li, Hang Zhong, Steffen Müller View a PDF of the paper titled Multi-Context Fusion Transformer for Pedestrian Crossing Intention Prediction in Urban Environments, by Yuanzhe Li and 2 other authors View PDF Abstract:Pedestrian crossing intention prediction is essential for autonomous vehicles to improve pedestrian safety and reduce traffic accidents. However, accurate pedestrian intention prediction in urban environments remains challenging due to the multitude of factors affecting pedestrian behavior. In this paper, we propose a multi-context fusion Transformer (MFT) that leverages diverse numerical contextual attributes across four key dimensions, encompassing pedestrian behavior context, environmental context, pedestrian localization context and vehicle motion context, to enable accurate pedestrian intention prediction. MFT employs a progressive fusion strategy, where mutual intra-context attention enables reciprocal interactions within each context, thereby facilitating feature sequence fusion and yielding a context token as a context-specific representation. This is followed by mutual cross-context attention, which integrates features across contexts with a global CLS token serving as a compact...