[2603.02080] From Pixels to Patches: Pooling Strategies for Earth Embeddings
About this article
Abstract page for arXiv paper 2603.02080: From Pixels to Patches: Pooling Strategies for Earth Embeddings
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.02080 (cs) [Submitted on 2 Mar 2026] Title:From Pixels to Patches: Pooling Strategies for Earth Embeddings Authors:Isaac Corley, Caleb Robinson, Inbal Becker-Reshef, Juan M. Lavista Ferres View a PDF of the paper titled From Pixels to Patches: Pooling Strategies for Earth Embeddings, by Isaac Corley and 3 other authors View PDF HTML (experimental) Abstract:As geospatial foundation models shift from patch-level to pixel-level embeddings, practitioners must aggregate thousands of pixel vectors into patch representations that preserve class-discriminative signal while matching downstream label resolution. The default choice, mean pooling, discards within-patch variability and can drop accuracy by more than 10% under spatial shift. To evaluate this effect, we introduce EuroSAT-Embed: 81,000 embedding GeoTIFFs derived from three foundation models: AlphaEarth, OlmoEarth, and Tessera. We benchmark 11 training-free and 2 parametric pooling methods under both random and geographically disjoint test splits. Our results show that richer pooling schemes reduce the geographic generalization gap by up to 40% relative to mean pooling and increases accuracy by up to 5% on spatial splits. We recommend Generalized Mean Pooling (GeM) as a drop-in replacement for mean pooling: it improves accuracy without increasing embedding dimensionality. For maximum accuracy, Stats pooling (concatenation of min/max/mean/std pooling) per...