[2603.02181] Leveraging Model Soups to Classify Intangible Cultural Heritage Images from the Mekong Delta
About this article
Abstract page for arXiv paper 2603.02181: Leveraging Model Soups to Classify Intangible Cultural Heritage Images from the Mekong Delta
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.02181 (cs) [Submitted on 2 Mar 2026] Title:Leveraging Model Soups to Classify Intangible Cultural Heritage Images from the Mekong Delta Authors:Quoc-Khang Tran, Minh-Thien Nguyen, Nguyen-Khang Pham View a PDF of the paper titled Leveraging Model Soups to Classify Intangible Cultural Heritage Images from the Mekong Delta, by Quoc-Khang Tran and Minh-Thien Nguyen and Nguyen-Khang Pham View PDF HTML (experimental) Abstract:The classification of Intangible Cultural Heritage (ICH) images in the Mekong Delta poses unique challenges due to limited annotated data, high visual similarity among classes, and domain heterogeneity. In such low-resource settings, conventional deep learning models often suffer from high variance or overfit to spurious correlations, leading to poor generalization. To address these limitations, we propose a robust framework that integrates the hybrid CoAtNet architecture with model soups, a lightweight weight-space ensembling technique that averages checkpoints from a single training trajectory without increasing inference cost. CoAtNet captures both local and global patterns through stage-wise fusion of convolution and self-attention. We apply two ensembling strategies - greedy and uniform soup - to selectively combine diverse checkpoints into a final model. Beyond performance improvements, we analyze the ensembling effect through the lens of bias-variance decomposition. Our findings sh...