Investigating Moran’s I Properties for Spatial Machine Learning: A Preliminary Analysis
Keywords: spatial machine learning, Moran’s I, random forest, spatial autocorrelation, model evaluation
Abstract. This study explores the application of Moran’s I, a measure of spatial autocorrelation, in evaluating spatial machine learning models, specifically focusing on random forest (RF) models applied to simulated raster data with varying spatial structures. The research simulates 300 scenarios (raster datasets), each with different spatial autocorrelation ranges (10, 50, and 100). It assesses model performance using root mean square error (RMSE) and Moran’s I values of the residuals across the entire raster, as well as for both training and testing samples. Based on our experimental setup, the results show that Moran’s I of the residuals is affected by the spatial structure of the data, with higher values observed for datasets with greater autocorrelation ranges. A weak correlation is found between RMSE and Moran’s I values, suggesting that Moran’s I can offer valuable supplementary insights beyond RMSE in evaluating the spatial quality of models. However, the study also highlights the sensitivity of Moran’s I to sample size and spatial proximity, which can lead to misleading interpretations of model quality. These findings underscore the potential limitations of relying solely on Moran’s I in spatial machine learning applications and raise critical questions regarding its dependence on sample size and spatial distance. The study calls for further investigation into these factors to enhance model evaluation and improve the accuracy of spatial model assessments.