Comparing supervised learning algorithms for Spatial Nominal Entity recognition

Medad, Amine; Gaio, Mauro; Moncla, Ludovic; Mustière, Sébastien; Le Nir, Yannick

doi:https://doi.org/10.5194/agile-giss-1-15-2020

Articles | Volume 1

https://doi.org/10.5194/agile-giss-1-15-2020

© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/agile-giss-1-15-2020

© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 1

15 Jul 2020

| 15 Jul 2020

Comparing supervised learning algorithms for Spatial Nominal Entity recognition

Amine Medad, Mauro Gaio, Ludovic Moncla, Sébastien Mustière, and Yannick Le Nir

Keywords: Geographic Information Retrieval, Natural Language Processing, Nominal Entity Recognition

Abstract. Discourse may contain both named and nominal entities. Most common nouns or nominal mentions in natural language do not have a single, simple meaning but rather a number of related meanings. This form of ambiguity led to the development of a task in natural language processing known as Word Sense Disambiguation. Recognition and categorisation of named and nominal entities is an essential step for Word Sense Disambiguation methods. Up to now, named entity recognition and categorisation systems mainly focused on the annotation, categorisation and identification of named entities. This paper focuses on the annotation and the identification of spatial nominal entities. We explore the combination of Transfer Learning principle and supervised learning algorithms, in order to build a system to detect spatial nominal entities. For this purpose, different supervised learning algorithms are evaluated with three different context sizes on two manually annotated datasets built from Wikipedia articles and hiking description texts. The studied algorithms have been selected for one or more of their specific properties potentially useful in solving our problem. The results of the first phase of experiments reveal that the selected algorithms have similar performances in terms of ability to detect spatial nominal entities. The study also confirms the importance of the size of the window to describe the context, when word-embedding principle is used to represent the semantics of each word.