Should We Ask an LLM? Evaluating Toponym Disambiguation across Administrative Levels
Keywords: LLMs, toponyms, geoparsing, geocoding, disambiguation
Abstract. Toponym disambiguation, determining which real-world location a place name refers to, is a critical step in geoparsing pipelines, yet existing evaluations mix disambiguation quality with the behavior of downstream geocoders through distance-based metrics. We propose evaluating disambiguation as a standalone task by prompting nine LLMs to predict administrative containment (ADM0 to ADM2) from textual context, scoring predictions directly with precision, recall, and F1 against GADM-derived labels on the LGL corpus. Performance declines systematically with administrative granularity, mid-sized models outperform the largest tested model, and recurring failure cases cluster around geopolitically complex regions. These findings suggest that feeding fine-grained LLM disambiguation outputs to geocoders may harm rather than help performance.