[2509.03830] Decoding Tourist Perception in Historic Urban Quarters with Multimodal Social Media Data: An AI-Based Framework and Evidence from Shanghai
Summary
This study presents an AI-based framework to analyze tourist perceptions in historic urban quarters of Shanghai, utilizing multimodal social media data to identify gaps between visitor expectations and reality.
Why It Matters
Understanding tourist perceptions is crucial for urban planners and heritage managers. This research provides a novel approach to assess visitor experiences, helping to enhance urban design and management strategies in historic areas, thereby improving tourism and local economies.
Key Takeaways
- The study introduces a multimodal AI framework to decode tourist perceptions.
- It highlights discrepancies between social media representations and actual urban environments.
- Results indicate a systematic foregrounding of specific streetscape elements in tourist photos.
- The framework can inform heritage management and urban design improvements.
- Findings emphasize the importance of understanding visitor satisfaction across multiple experience dimensions.
Computer Science > Artificial Intelligence arXiv:2509.03830 (cs) [Submitted on 4 Sep 2025 (v1), last revised 22 Feb 2026 (this version, v3)] Title:Decoding Tourist Perception in Historic Urban Quarters with Multimodal Social Media Data: An AI-Based Framework and Evidence from Shanghai Authors:Kaizhen Tan, Yufan Wu, Yuxuan Liu, Haoran Zeng View a PDF of the paper titled Decoding Tourist Perception in Historic Urban Quarters with Multimodal Social Media Data: An AI-Based Framework and Evidence from Shanghai, by Kaizhen Tan and 3 other authors View PDF Abstract:Historic urban quarters are increasingly shaped by tourism and lifestyle consumption, yet planners often lack scalable evidence on what visitors notice, prefer, and criticize in these environments. This study proposes an AI-based, multimodal framework to decode tourist perception by combining visual attention, color-based aesthetic representation, and multidimensional satisfaction. We collect geotagged photos and review texts from a major Chinese platform and assemble a street view image set as a baseline for comparison across 12 historic urban quarters in Shanghai. We train a semantic segmentation model to quantify foregrounded visual elements in tourist-shared imagery, extract and compare color palettes between social media photos and street views, and apply a multi-task sentiment classifier to assess satisfaction across four experience dimensions that correspond to activity, physical setting, supporting services, an...