Journal cover Journal topic
AGILE: GIScience Series Open-access proceedings of the Association of Geographic Information Laboratories in Europe
Journal topic
Articles | Volume 3
AGILE GIScience Ser., 3, 5, 2022
AGILE GIScience Ser., 3, 5, 2022
10 Jun 2022
10 Jun 2022

Spatial Disaggregation of Population Subgroups Leveraging Self-Trained Multi-Output Gradient Boosting Regression Trees

Marina Georgati1, João Monteiro2, Bruno Martins2, and Carsten Keßler3,1 Marina Georgati et al.
  • 1Department of Planning, Aalborg University, Copenhagen, Denmark
  • 2Instituto Superior Técnico and INESC-ID, Universidade de Lisboa, Lisboa, Portugal
  • 3Department of Geodesy, Bochum University of Applied Sciences, Bochum, Germany

Keywords: spatial disaggregation, gridded population datasets, gradient tree boosting, self-supervised learning

Abstract. Accurate and consistent estimations on the present and future population distribution, at fine spatial resolution, are fundamental to support a variety of activities. However, the sampling regime, sample size, and methods used to collect census data are heterogeneous across temporal periods and/or geographic regions. Moreover, the data is usually only made available in aggregated form, to ensure privacy. In an attempt to address these issues, several previous initiatives have addressed the use of spatial disaggregation methods to produce high-resolution gridded datasets describing the human population distribution, although these projects have usually not addressed specific population subgroups. This paper describes a spatial disaggregation method based on self-training regression models, innovating over previous studies in the simultaneous prediction of disaggregated counts for multiple inter-related variables, by leveraging multi-output models based on gradient tree boosting. We report on experiments for two case studies, using high-resolution data (i.e., counts for different subgroups available at a resolution of 100 meters) for the municipality of Amsterdam and the region of Greater Copenhagen. Results show that the proposed approach can capture spatial heterogeneity and the dependency on local factors, outperforming alternatives (e.g., seminal disaggregation algorithms, or approaches leveraging individual regression models for each variable) in terms of averaged error metrics, and also upon visual inspection of spatial variation in the resulting maps.

Publications Copernicus