Land use influence on ambient PM2.5 and ammonia concentrations: Correlation analyses in the Lombardy region, Italy

. Air pollution is identified as the primary environmental risk to health worldwide. Although most of the anthropic emissions are due to combustion processes, intensive farming activities may also contribute significantly, especially as a source of particulate matter 2.5 and ammonia. Investigations on particulate matter and precursors dynamics, identifying the most relevant environmental factors influencing their emissions, are critical to improving local and regional air quality policies. This work presents an analysis of the correlation between particulate matter 2.5 and ammonia concentrations, obtained from the Copernicus Atmosphere Monitoring Service, and local land use characteristics, to investigate the influence of agricultural activities on the space-time pollutant concentration patterns. The selected study area is the Lombardy region, northern Italy. Correlation is evaluated through Spearman’s coefficient. Agricultural areas resulted in a significant factor for high ammonia concentrations, while particulate matter 2.5 was strongly correlated with built-up areas. Natural areas resulted instead a protective factor for both pollutants. Results provide data-driven evidence of the land use effect on air quality, also quantifying such effects in terms of correlation coefficients magnitude.


Introduction
Air pollution is identified as the most significant environmental risk to health worldwide and -accordingly -directly considered in several United Nations Sustainable Development Goals (e.g., 3.9 and 11.6) (Rafaj et al., 2018).
Air pollutants and greenhouse gases have increased their concentration in the troposphere in the last century, due to the intensification of energy production, industrial processes, and transport. Despite most of the anthropic emissions being due to combustion processes, intensive farming activities also contribute significantly (McDuffie et al., 2020). In rural areas, this sector prevails in the emission of a few critical primary pollutants, in particular nitrogen emissions such as nitrogen oxides (NOX) and ammonia (NH3). Direct exposure to such pollutants has a marginal impact on human health, due to their relatively low toxicity (at an average ambient concentration) and their short residence time in the atmosphere (Zhu et al., 2015), although it was hypothesized that it may had a role in facilitating the spread of COVID-19 in the first outbreak (Gianquintieri et al., 2021). Anyhow, atmospheric NH3 reacts with other gaseous emissions, including sulphur dioxide (SO2) and NOX, promoting the generation of both coarse and fine secondary particulate matter. Inhalation of the fine fraction of particulate matter with diameters of 2.5 µm or smaller (PM2.5) is instead associated with severe negative impacts on human health, including respiratory tract diseases, undermined lung function, and raised morbidity and mortality of cardiopulmonary diseases (Xing et al., 2015). Contrary to other farming-related emissions, PM2.5 resides in the atmosphere for days to a week, thus enhancing the risk of exposure for the population (Awe et al., 2022).
Investigations on PM2.5 and precursors dynamics, as well as the identification of the most relevant environmental factors influencing their emissions, are critical to improving local and regional air quality policies. Nevertheless, this kind of study requires space-timeresolved information on PM2.5 concentration, which is not generally available from ground-based measurements (air quality sensors), especially in rural areas (Castell et al., 2017). The availability of open, multi-temporal and spatially continuous data to describe both air pollutants concentration and the possible influencing environmental factors is therefore a key asset for such analyses.
Accordingly, the aim of this work was to study the correlation between PM2.5 and NH3 concentrations, obtained from the Copernicus Atmosphere Monitoring Service (CAMS) (Peuch et al., 2022), and local land use characteristics, to investigate the influence of agricultural activities on the space-time concentration patterns for the selected pollutants (Haider et al., 2020). The territory of interest is represented by the Lombardy region (Northern Italy), which is the most populated Italian region, with also the highest national agricultural production. Air quality in Lombardy is a primary concern due to its high urbanization/industrialization, as well as to its geomorphological conditions (Po River valley) that result unfavourable for air pollutants dispersion. A comparison of observed correlations between agricultural and urban areas will also be provided and discussed.

Study Setting
The selected area of interest is the Lombardy region in northern Italy (23'863 km 2 ), with a resident population of 10.6M. The pollutants considered for the analysis are particulate matter < 2.5 µm (PM2.5), which represents a major health risk, and ammonia (NH3), a precursor of PM2.5, known to be generated by agricultural activity. Space-time series of their concentrations [µg/m3] for the study area are retrieved from the Copernicus Atmosphere Monitoring Service (CAMS). Ensemble medians from CAMS European air quality forecasts (analysis dataset at surface level) were used (https://ads.atmosphere.copernicus.eu). Data are openly distributed as multi-temporal grids, using the NetCDF format, a widely adopted set of libraries for structuring and distributing array-like scientific data (https://www.unidata.ucar.edu/software/netcdf), with a spatial resolution of 0.1° and a time resolution of 1 hour.
For this study, pollutant concentrations data were aggregated in time with a resolution of one week, targeting peaks of pollution concentration that are prolonged in time; pollution grids were downscaled to a measure of 0.06° (≃5.5 km) to enlarge the data sample for correlation analyses. The analysis period included years 2020 and 2021, selecting the months of January, March, April, October, and November. In the Lombardy region, as indicated by the regional agency for environmental protection ARPA Lombardia (https://www.arpalombardia.it), these periods coincide with the fertilization process of agricultural areas, when most of nitrogen-rich animal manure is applied on crops. The month of January, when manuring is limited by the regional air quality regulation framework, was considered to also investigate the possible influence of domestic heating from dense urban areas on PM2.5 emissions. It is worth noticing that the years 2020 and 2021 were characterized by lockdowns due to COVID-19 pandemic, which had limited to none impact on agricultural activity, but was to some extent limiting other human activities; however, only the months of March and April 2020 were characterized by a level of restriction that could had an impact on pollution emissions, while other working activities were carried almost regularly in the other considered periods. The variability in the data generated by those events is not considered to be hindering the robustness of the analysis.
The analysis was conducted with two temporal aggregation strategies: • Chronological: single months and seasonal or yearly aggregations • Pollution peaks: selection of weeks where the highest concentrations were recorded; to this aim, two thresholds based on quantiles (median, 3rd quartile, and 90th percentile) were set: a first threshold to define how many weeks to include, and a second one to define the parameter that represents the pollution measurement for each week on the whole territory, for a total of 3x3=9 sample selections (e.g. in the median-3rd quartile sample, the most polluted 50% of the weeks are included, and the pollution level of each week is computed as the 3rd quartile across all cells in the study area).
From the spatial point of view, a first analysis was performed for the whole study region on a regular grid, obtained by downscaling the CAMS grid cells overlapping the Lombardy region. A second analysis was focused on urban areas only, thus cells with more than 25% of the surface covered by built-up land use classes (including buildings, streets, industries, and other infrastructures), which corresponds to 12.6% of the region. The derived land use map with relevant colour-coded clusters is reported in Figure 1, along with examples of the analysis grid.
The considered environmental factors, whose correlation with pollutants concentration was inspected, were land use fractions, computed as the percentage of cells' area covered by different land use classes. To this purpose, the Lombardy region land use map, openly released as a vector layer (scale 1:5000, production year 2018) from the regional geoportal (https://tinyurl.com/cuwj7auh), was the reference land use data source. Agricultural areas were further reclassified into crop types by means of intersection with the Lombardy region agricultural land use vector map (https://tinyurl.com/rafdxkkp). Road infrastructure areas were extracted from the Lombardy region topographic database (https://tinyurl.com/musuh2yj).
In detail, the considered classes of land use were built-up area (all together, and separately for buildings, industries, and streets), agricultural area (all together, and separately for rice, corn, and cereals cultures) and natural area. A complete description of the considered features, with their corresponding data sources, is provided in Table 1. The values of such features were computed for each cell, with three different approaches: considering each cell alone, considering the average of the surrounding 8 cells (thus focusing only on the edge conditions regardless the point of measurement), and considering the average among both the central and the surrounding 8 cells.

Correlation Analysis
The correlation between environmental variables and pollutants concentration was studied with univariate modelling, computing the linear correlation of the rank value of each record (one cell, one week) with relevance to the concentration of each pollutant (separately for PM2.5 and NH3) as the dependent variable y, and to each environmental variable as the independent variable x. The strength of correlations was evaluated through Spearman's correlation coefficient (R in the following), which is a correlation analysis between the rank of each data point in the distribution of the two measures under consideration. This methodology was chosen as it allows tackling the heterogeneity in data distribution and does not imply any assumption about data normality.

Processing Workflow
An iterative procedure was applied to compute the correlation analysis on each different setting, hence for the two pollutants (PM2.5, NH3), in each time protocol (27 different chronological aggregations and 9 different pollution peaks aggregations), separately for the two spatial aggregations (whole territory or urban area only), considering each environmental variable (for a total of 9) computed with the three different spatial filtering approaches (single cell, surrounding cells, whole block of central and surrounding cells). This workflow was implemented to consider both space and time dynamicity Data sources*: 1 -Regional land use DUSAF https://tinyurl.com/cuwj7auh 2 -Regional topographic database https://tinyurl.com/musuh2yj 3 -Regional agricultural land use https://tinyurl.com/rafdxkkp Figure 1. Map of the study territory (Lombardy region, northern Italy) with land-use classification (A), measurement grid for pollution concentration with example measure of NH3 (B) and fraction of corn fields (C) in the phenomenon. As a result, a total of 3888 correlation coefficients were computed.

Data and Software Availability
Data processing and correlation analysis were performed with Python (v3.7) programming language, while graphical representations were implemented in QGIS. All the code used for the analysis is publicly available on GitHub (https://github.com/gisgeolab/D-DUST/tree/WP4). Input data for the analysis, including CAMS datasets and Lombardy region land use maps, are open and accessible at the links and references provided in Section 2.1. Sample analysis-ready data, together with documentation on data pre-processing, were also published on Zenodo (https://doi.org/10.5281/zenodo.6906903).

Whole Territory
Concerning the built-up fraction of the environment, its total amount had the strongest correlation with PM2.5 concentration during April (R=.876), while the correlation with the pollution peaks was R=.795; among the single components, the strongest correlation was recorded for the industrial area (R=.873 in April, R=.856 during pollution peaks). With regards to NH3, the strongest correlation was recorded in November (R=.682), while the correlation with pollution peaks was R=.669; again, the most correlated component was the industrial area, with R=.795 in November and R=.746 during pollution peaks.
The total agricultural area resulted correlated the most with PM2.5 during January (R=.844), while during pollution peaks it reached R=.833. Considering the different crops, the highest correlation was obtained with corn (R=.914 both in January and during pollution peaks), while a lower but significant correlation was found for cereals (R=.784 and R=.775), with a weak correlation for rice (R=.352 and R=.316). Concerning NH3, the highest values were in March (R=.886), and the correlation coefficient reached R=.869 during pollution peaks. Again, the correlation was very high with corn (R=.947 in March, R=.949 during pollution peaks), lower but significant for cereals (R=.829 and R=.809) and absent for rice (R=.204 and R=.183).
Finally, natural area always resulted protective, hence with negative values of R (representing an inverse proportionality between the two measures): R=-.912 during pollution peaks and a minimum value of R=-.914 in January with relation to PM2.5, R=-.928 (peaks) and R=-.934 (March) for NH3. Table  2 and Figure 2.

Urban Area
Focusing on the urban areas only, all correlations were consistently lower. PM2.5 was mostly correlated with the total built-up surface during January (R=.634) and reached R=.637 during pollution peaks, with again the industrial area as the main factor (R=.74 in January, R=.76 during peaks), while NH3 did not correlate with this variable (max value R=.137 in January).
Considering the agricultural land, a lower correlation was recorded with PM2.

Discussion
For the interpretation of results, it is worth pointing out that the highest correlation values were generally obtained by considering both the central cell and the surroundings, meaning that the methodology considering an enlarged perspective was capable of better capturing possible cause-effect relationships. The only exception stands for NH3 in the urban areas only, where the largest correlations with agricultural areas were obtained considering only the surrounding cells (suggesting that the edge conditions matter the most in this set-up, regardless of the point of measurement), and the highest values for built-up environment emerged when considering only the single cells. Moreover, while some interesting considerations can be drawn from chronological aggregations, an aggregation based on pollution peaks is more significant (as these periods represent the worst threat to population health), and greater attention should be paid to those results.

Interpretation of results for PM2.5
As can be expected on the basis of well-established knowledge, the fraction of built-up area in the territory resulted in a significant factor for the concentration of PM2.5, both considering the whole territory and focusing on urban areas only; it must be noticed that, in this second case, lower values were recorded, but this result could be partially explained by the significant reduction in the data sample dimension, and this consideration is transversal to the whole analysis. In the urban area, the correlation peaks were recorded in January (R=.634), as can be expected considering the impact of heating systems, while, when enlarging the analysis on the whole territory, the maximum correlation was recorded in April (R=.876), Figure 2. Main results of correlation analysis, evaluated through Spearman's correlation coefficient, between PM2.5-NH3 and different land-use classes, separately for the whole territory (Lombardy region, northern Italy, upper panels) and for the strongly urban areas only (lower panels), considering time aggregations either based on chronological order (left panels) and on peaks of pollutants concentration (right panels). secondarily in November (R=.81) and October (R=.806), and a lower value of R=.738 in January. The correlation appeared much stronger in the larger perspective, but the different timing seems to suggest a more manifold phenomenon compared to the only effect of the heating systems. Similar conclusions can be drawn considering pollution peaks, where the impact of built-up terrain reached R=.795 on the whole territory, thus lower compared to chronological aggregation, and R=.637 in the urban areas, thus instead slightly higher than the chronological aggregation. In all experimental settings, the single component of the built-up environment with the highest impact was the industrial areas (R=.873 on the whole territory, R=.74 in urban areas); a significant correlation was also found for buildings (R=.841) and roads (R=.783) considering the whole territory, while their impact on urban areas only was limited (R=.356, R=.599).
Concerning agricultural terrains, the correlation with PM2.5 on the urban environment resulted limited, with the only mildly significant value found for cereal crops (during pollution peaks with R=.658). On the opposite, a strong correlation can be verified when considering the whole territory, in particular for corn crops, while a mild impact was due to cereals, and a weak correlation was found for rice. The maximum value in chronological aggregations was lower compared to the fraction of builtup area, with R=.844 against R=.876, but it was instead higher when considering pollution peaks, with a maximum of R=.833 against R=.795. This is a particularly relevant result, as it suggests that the most intense peaks of PM2.5 concentration on the territory occurred in correspondence to more densely farmed areas rather than in most urbanized ones. Anyway, when considering population exposure, this aspect is consistently relevant only for people living in mildly urbanized areas, while concerning people living in larger cities, the fraction of built-up environment on the territory has a stronger impact.
Finally, natural areas resulted strongly correlated with reduced pollution concentrations.

Interpretation of results for NH3
The perspective is different when considering NH3 pollution. In this case, the impact of built-up terrain was limited, with R=.669 during pollution peaks, and was not relevant when focusing only on urban areas (max R=.137). This result may suggest that the level of urbanization does contribute to NH3 concentration only up to a certain threshold, after which a further increase in the amount of built-up environment is no longer impactful. On the opposite, the role of agricultural terrains on NH3 concentration was evident on the whole territory, with R=.869 during pollution peaks, and significant also on urban areas only, with R=.8. Specifically, the correlation was particularly high when considering corn crops, with R=.949 and R=.868, respectively on the whole territory and urban areas only, while lower values were found for cereals (R=.809 and R =.459) and no significant correlation was recorded with rice (R=.183 and R=.245).
As for natural areas, the results are equivalent to those for PM2.5, hence they were strongly correlated with reduced pollution concentration.

Limits and future developments
The main limitations of this study are related to the temporal extension, which was limited to two years and is therefore susceptible to random noise in the data, as well as to the impact of other variable factors (such as meteorological conditions, in particular wind), and to the spatial resolution, which, despite the downscaling, was still limiting the sample size and variability. To cope with such issues, the same analysis should be repeated over the years, with new data available, and possibly extended and/or replicated on different territories. Moreover, from a statistical point of view, the proposed modelling is univariate and is therefore missing possible interactions between the different considered environmental variables. In this perspective, a possible development will be to include a multi-variate model, e.g., with a multivariate logistic regression (where variables are evaluated through odds ratio) and/or with a random forest algorithm (e.g. evaluating variables with the SHAP (Lundberg and Lee, 2017) methodology).

Conclusion
The proposed study explores the correlation between the concentration of two pollutants, PM2.5 and NH3, and different classes of land use, in particular related to the built-up environment and agricultural activities, adopting a data-driven and iterative approach and taking advantage of continuous mapping. The analysis was first performed on the whole study territory (the region of Lombardy, in northern Italy), and then repeated considering only the most urbanized areas, to infer a focus on population exposure. Results showed that both the built-up environment and the agricultural terrains had a significant impact on pollution; in particular, considering the whole territory, the worst peaks of PM2.5 were more correlated with agricultural areas rather than with the fraction of built-up environment, meaning that people living outside of larger cities were more affected by farming activities than they were by urbanization. This does not stand within metropolitan areas, where the impact of agriculture was much more limited compared to urbanization. Concerning NH3, results were more aligned in indicating that the strongest correlation was with agricultural activity, while that with the fraction of built-up environment was weaker and limited up to a certain level of urbanization. Finally, the component of the built environment that seemed to contribute the most to pollution was industry, while the most polluting crop appeared to be corn.