AGILE-GISS

AGILE: GIScience Series

AGILE-GISS

AGILE GIScience Ser.

2700-8150

Copernicus Publications

Göttingen, Germany

10.5194/agile-giss-7-14-2026

Investigating the Generalizability of Segment Anything Model for Large-Scale Geospatial Segmentation

Mansour

Wejdene

https://orcid.org/0009-0008-4362-2092

¹ Walther

Paul

¹ Li

Hao

² Werner

Martin

Department of Aerospace and Geodesy, TUM School of Engineering and Design, Technical University of Munich, Germany

Department of Geography, National University of Singapore, Singapore

10 06 2026

2026

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/

This article is available from https://agile-giss.copernicus.org/articles/7/14/2026/agile-giss-7-14-2026.html

The full text article is available as a PDF file from https://agile-giss.copernicus.org/articles/7/14/2026/agile-giss-7-14-2026.pdf

Foundation Models (FMs) are promising approaches in multimodal artificial intelligence as they provide foundational task knowledge across computer vision, language understanding, and related domains. Despite their success, the extent to which FMs generalize to domain-specific tasks remains unclear, especially in Earth System Sciences (ESS). In this work, we investigate the geographical and task-level generalizability of Segment Anything Model (SAM) and the vision–language FMs CLIP and Grounding DINO, across two distinct vision tasks: 1) building footprint segmentation from high-quality airborne images at 40cm ground sampling distance (GSD) and 2) surface water segmentation from Sentinel-2 imagery at about 10m GSD. Herein, we explore strategies to improve the zero-shot applicability of the general-purpose SAM by combining it with other pre-trained FMs for detection and classification, and we evaluate the potential performance gains achievable with minimal computational overhead through few-shot adapters on the datasets. Furthermore, we assess whether remote-sensing-specific training in RemoteCLIP and RemoteSAM leads to meaningful improvements over their general-purpose counterparts in large-scale geospatial segmentation. Overall, we conclude that domain-specific FMs can provide performance gains in certain settings, but are neither required nor always useful when compared with lightweight adaptation strategies and mixtures of different general models. This suggests that a more economical pathway might be to increase the remote sensing data used in the training of general FMs instead of training dedicated models specifically for ESS.