Should We Ask an LLM? Evaluating Toponym Disambiguation across Administrative Levels

Welscher, Franz; Smith, Paddy; Leppämäki, Tatu; Ilyankou, Ilya

doi:10.5194/agile-giss-7-46-2026

Articles | Volume 7

https://doi.org/10.5194/agile-giss-7-46-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/agile-giss-7-46-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 7

10 Jun 2026

| 10 Jun 2026

Should We Ask an LLM? Evaluating Toponym Disambiguation across Administrative Levels

Franz Welscher, Paddy Smith, Tatu Leppämäki, and Ilya Ilyankou

Keywords: LLMs, toponyms, geoparsing, geocoding, disambiguation

Abstract. Toponym disambiguation, determining which real-world location a place name refers to, is a critical step in geoparsing pipelines, yet existing evaluations mix disambiguation quality with the behavior of downstream geocoders through distance-based metrics. We propose evaluating disambiguation as a standalone task by prompting nine LLMs to predict administrative containment (ADM0 to ADM2) from textual context, scoring predictions directly with precision, recall, and F1 against GADM-derived labels on the LGL corpus. Performance declines systematically with administrative granularity, mid-sized models outperform the largest tested model, and recurring failure cases cluster around geopolitically complex regions. These findings suggest that feeding fine-grained LLM disambiguation outputs to geocoders may harm rather than help performance.