[2601.08026] FigEx2: Visual-Conditioned Panel Detection and Captioning for Scientific Compound Figures

[2601.08026] FigEx2: Visual-Conditioned Panel Detection and Captioning for Scientific Compound Figures

arXiv - AI 4 min read Article

Summary

The paper presents FigEx2, a framework for detecting and captioning panels in scientific compound figures, enhancing understanding and accessibility of complex data visualizations.

Why It Matters

FigEx2 addresses a significant gap in scientific communication by improving the clarity and detail of figure captions, which are often inadequate. This advancement can facilitate better comprehension of research findings across disciplines, particularly in fields like physics and chemistry where visual data is prevalent.

Key Takeaways

  • FigEx2 localizes panels and generates detailed captions from compound figures.
  • Introduces a noise-aware gated fusion module to enhance captioning accuracy.
  • Combines supervised and reinforcement learning for optimized performance.
  • Achieves high detection accuracy and outperforms existing models in key metrics.
  • Demonstrates strong zero-shot transferability to new scientific domains.

Computer Science > Computer Vision and Pattern Recognition arXiv:2601.08026 (cs) [Submitted on 12 Jan 2026 (v1), last revised 25 Feb 2026 (this version, v3)] Title:FigEx2: Visual-Conditioned Panel Detection and Captioning for Scientific Compound Figures Authors:Jifeng Song, Arun Das, Pan Wang, Hui Ji, Kun Zhao, Yufei Huang View a PDF of the paper titled FigEx2: Visual-Conditioned Panel Detection and Captioning for Scientific Compound Figures, by Jifeng Song and 5 other authors View PDF HTML (experimental) Abstract:Scientific compound figures combine multiple labeled panels into a single image, but captions in real pipelines are often missing or only provide figure-level summaries, making panel-level understanding difficult. In this paper, we propose FigEx2, visual-conditioned framework that localizes panels and generates panel-wise captions directly from the compound figure. To mitigate the impact of diverse phrasing in open-ended captioning, we introduce a noise-aware gated fusion module that adaptively filters token-level features to stabilize the detection query space. Furthermore, we employ a staged optimization strategy combining supervised learning with reinforcement learning (RL), utilizing CLIP-based alignment and BERTScore-based semantic rewards to enforce strict multimodal consistency. To support high-quality supervision, we curate BioSci-Fig-Cap, a refined benchmark for panel-level grounding, alongside cross-disciplinary test suites in physics and chemistry. Exp...

Related Articles

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection
Machine Learning

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

Abstract page for arXiv paper 2506.22504: Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

arXiv - Machine Learning · 4 min ·
[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD
Machine Learning

[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

Abstract page for arXiv paper 2508.00307: Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

arXiv - AI · 4 min ·
[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild
Computer Vision

[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

Abstract page for arXiv paper 2603.25524: CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations i...

arXiv - AI · 4 min ·
[2603.25170] Knowledge-Guided Adversarial Training for Infrared Object Detection via Thermal Radiation Modeling
Machine Learning

[2603.25170] Knowledge-Guided Adversarial Training for Infrared Object Detection via Thermal Radiation Modeling

Abstract page for arXiv paper 2603.25170: Knowledge-Guided Adversarial Training for Infrared Object Detection via Thermal Radiation Modeling

arXiv - AI · 4 min ·
More in Computer Vision: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime