[2509.21609] VLCE: A Knowledge-Enhanced Framework for Image Description in Disaster Assessment

[2509.21609] VLCE: A Knowledge-Enhanced Framework for Image Description in Disaster Assessment

arXiv - Machine Learning 4 min read Article

Summary

The paper presents VLCE, a framework that enhances image description for disaster assessment by integrating external semantic knowledge, improving the accuracy of visual data interpretation.

Why It Matters

This research addresses the limitations of current visual language models (VLMs) in disaster scenarios by introducing a method that combines AI with domain-specific knowledge. This advancement could significantly improve real-time disaster response efforts, making it highly relevant for emergency management and AI applications in crisis situations.

Key Takeaways

  • VLCE integrates external knowledge sources like ConceptNet and WordNet to enhance image captioning in disaster assessments.
  • The framework utilizes CNN-LSTM and Vision Transformer architectures to process satellite and UAV imagery effectively.
  • VLCE outperforms baseline models, achieving a 95.33% accuracy on UAV imagery assessments.
  • The research signifies a shift from basic visual classification to generating actionable intelligence for disaster management.
  • Immediate applicability in real-time systems can improve disaster response efficiency.

Computer Science > Computer Vision and Pattern Recognition arXiv:2509.21609 (cs) [Submitted on 25 Sep 2025 (v1), last revised 17 Feb 2026 (this version, v5)] Title:VLCE: A Knowledge-Enhanced Framework for Image Description in Disaster Assessment Authors:Md. Mahfuzur Rahman, Kishor Datta Gupta, Marufa Kamal, Fahad Rahman, Sunzida Siddique, Ahmed Rafi Hasan, Mohd Ariful Haque, Roy George View a PDF of the paper titled VLCE: A Knowledge-Enhanced Framework for Image Description in Disaster Assessment, by Md. Mahfuzur Rahman and 7 other authors View PDF HTML (experimental) Abstract:The processes of classification and segmentation utilizing artificial intelligence play a vital role in the automation of disaster assessments. However, contemporary VLMs produce details that are inadequately aligned with the objectives of disaster assessment, primarily due to their deficiency in domain knowledge and the absence of a more refined descriptive process. This research presents the Vision Language Caption Enhancer (VLCE), a dedicated multimodal framework aimed at integrating external semantic knowledge from ConceptNet and WordNet to improve the captioning process. The objective is to produce disaster-specific descriptions that effectively convert raw visual data into actionable intelligence. VLCE utilizes two separate architectures: a CNN-LSTM model that incorporates a ResNet50 backbone, pretrained on EuroSat for satellite imagery (xBD dataset), and a Vision Transformer developed for UAV ...

Related Articles

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·
[2603.26292] findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding
Llms

[2603.26292] findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding

Abstract page for arXiv paper 2603.26292: findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding

arXiv - AI · 3 min ·
More in Computer Vision: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime