[2602.01696] Cross-Modal Purification and Fusion for Small-Object RGB-D Transmission-Line Defect Detection
Summary
This paper presents CMAFNet, a novel network for detecting small defects in transmission lines using RGB-D data, achieving significant performance improvements over existing methods.
Why It Matters
The detection of small defects in transmission lines is critical for infrastructure maintenance and safety. This research addresses the challenges posed by small-scale defects and complex backgrounds, offering a more effective solution that can enhance automated inspections, particularly in UAV applications.
Key Takeaways
- CMAFNet integrates RGB and depth data to improve defect detection accuracy.
- The proposed method outperforms existing baseline models by notable margins.
- A lightweight version of CMAFNet achieves high performance with lower computational costs.
- The approach utilizes advanced techniques like feature purification and contextual integration for better results.
- This research contributes to the field of automated inspection technologies, particularly in UAV applications.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.01696 (cs) [Submitted on 2 Feb 2026 (v1), last revised 15 Feb 2026 (this version, v3)] Title:Cross-Modal Purification and Fusion for Small-Object RGB-D Transmission-Line Defect Detection Authors:Jiaming Cui, Wenqiang Li, Shuai Zhou, Ruifeng Qin, Feng Shen View a PDF of the paper titled Cross-Modal Purification and Fusion for Small-Object RGB-D Transmission-Line Defect Detection, by Jiaming Cui and 4 other authors View PDF HTML (experimental) Abstract:Transmission line defect detection remains challenging for automated UAV inspection due to the dominance of small-scale defects, complex backgrounds, and illumination variations. Existing RGB-based detectors, despite recent progress, struggle to distinguish geometrically subtle defects from visually similar background structures under limited chromatic contrast. This paper proposes CMAFNet, a Cross-Modal Alignment and Fusion Network that integrates RGB appearance and depth geometry through a principled purify-then-fuse paradigm. CMAFNet consists of a Semantic Recomposition Module that performs dictionary-based feature purification via a learned codebook to suppress modality-specific noise while preserving defect-discriminative information, and a Contextual Semantic Integration Framework that captures global spatial dependencies using partial-channel attention to enhance structural semantic reasoning. Position-wise normalization within the purification stage ...