Semantic complexity of geographic questions - A comparison in terms of conceptual transformations of answers

Nyamsuren, Enkhbold; Xu, Haiqi; Top, Eric J.; Scheider, Simon; Steenbergen, Niels

doi:https://doi.org/10.5194/agile-giss-4-10-2023

Articles | Volume 4

https://doi.org/10.5194/agile-giss-4-10-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/agile-giss-4-10-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 4

06 Jun 2023

| 06 Jun 2023

Semantic complexity of geographic questions - A comparison in terms of conceptual transformations of answers

Enkhbold Nyamsuren, Haiqi Xu, Eric J. Top, Simon Scheider, and Niels Steenbergen

Keywords: geographical information systems, question answering, question corpora, core concept transformation, question complexity

Abstract. There is an increasing trend of applying AIbased automated methods to geoscience problems. An important example is a geographic question answering (geoQA) focused on answer generation via GIS workflows rather than retrieval of a factual answer. However, a representative question corpus is necessary for developing, testing, and validating such generative geoQA systems. We compare five manually constructed geographical question corpora, GeoAnQu, Giki, GeoCLEF, GeoQuestions201, and Geoquery, by applying a conceptual transformation parser. The parser infers geo-analytical concepts and their transformations from a geographical question, akin to an abstract GIS workflow. Transformations thus represent the complexity of geo-analytical operations necessary to answer a question. By estimating the variety of concepts and the number of transformations for each corpus, the five corpora can be compared on the level of geo-analytical complexity, which cannot be done with purely NLP-based methods. Results indicate that the questions in GeoAnQu, which were compiled from GIS literature, require a higher number as well as more diverse geo-analytical operations than questions from the four other corpora. Furthermore, constructing a corpus with a sufficient representation (including GIS) may require an approach targeting a uniquely qualified group of users as a source. In contrast, sampling questions from large-scale online repositories like Google, Microsoft, and Yahoo may not provide the quality necessary for testing generative geoQA systems.