Public health needs GIScience (like now)

During the last 20 years we have seen the reemergence of diseases; emergence of new diseases in new locations and witnessed outbreaks of varying intensity and duration. Spatial epidemiology plays an important role in understanding the patterns of disease and how they change over time and across space. The aim of this paper is to bring together a public health and geospatial data science perspective to provide a framework that will facilitate the integration of geographic information and spatial analyses at different stages of public health response so that these data and methods can be effectively used to enhance surveillance and monitoring, intervention strategies (planning and implementation of a response) and facilitate both shortand long-term forecasting. To demonstrate elements of this framework and how it can be utilized, we selected three case studies ranging from the current the global COVID-19 Coronavirus pandemic of 2020 to more historical examples such as the John Snow Cholera outbreak of 1854 and the Ebola outbreak of 2014 in West Africa. A variety of methods including spatial descriptive statistics, as well as methods for analysing patterns were used. The examples we provide can reveal sources of infection, connectivity between locations, delineate zones of containment and show the spread of an outbreak globally and locally across space and time.


Introduction
During the past 20 years we have seen the emergence and re-emergence of many diseases (Figure 1), many in new locations including the current pandemic, COVID-19 ( Figure 2).
A key component of staying healthy is to minimize our risk of getting sick. To do so, we want to know how to avoid getting ill by understanding where a disease is, when it is present, if there is a temporal component to its incidence, and what preventative measures we can take to stop us from becoming ill.
Patterns influencing health and disease in the environment are complex and require an understanding of the ecology of the disease (agent, host, environment), how these interact in space and time), and how diseases may move through the landscape (mobility, connectivity, and dispersion pathways) so that we can respond (plan and implement control and prevention), and recover (seek . The word cloud shows disease based on the number of reports that contain that disease name in the summary report title provided by the DONs (N> 5 reports). Analysis was conducted in R. Diseases include those that may have occurred for a variety of reasons (8,9) as summarized by the four categories.

Includes: • emergence of new diseases in new locations
• evolution of disease resulting in the emergence of new pathogens and resistance. • re-emergence of eliminated diseases in the same or nearby locations. • regular occurrence of diseases in the same location diagnosis, prevent, and provide treatment) in a timely manner. This requires understanding the interplay between diseases, their environments, and their hosts (ecology of disease) and how these may change risk over time. We need to think simultaneously about how a disease agent and the host interact at various spatial and temporal scales in a dynamically changing environment and what the outcome of such changes may be.
In recent years, access to novel data sources has been increasing with the availability of new devices that enable data to be collected easily, alongside point-ofcare diagnostics, at a precise location in time. These technologies range from mobile device add-ons (e.g. spectrometer), mobile apps, wearable technologies (e.g. GPS watches, Fitbits) and remote sensors (e.g. Wi-Fi loggers collecting a variety of environmental data; unmanned aerial vehicles (UAVs)), many of which have built-in GPS-enabled devices. In the COVID-19 era, apps specifically to help notify people of possible exposures using Bluetooth technology have been developed and are now in use (11). In addition, apps for restaurants and other social venues for patrons to register are being used to facilitate contact tracing (14). Through these data collection avenues, we are able to provide richer and more diverse sources of information about ourselves and the environments in which we live than ever before. Although we have moved into an era of digital exploration, there remain many challenges in using, analysing, integrating, and applying these data, particularly when they vary in quality and availability (both in terms quantity and at rapidity) (15,16). Leveraging these data and technologies together with existing surveillance methods of humans and animals will be useful for improving our understanding of the mechanisms influencing health across different spatial and temporal scales; enhancing diagnostics and predictions; as well as developing preventative strategies. Furthermore, with increased mobility and the influence of external factors, such as changes in climate and globalization, we need to integrate multiple types of geographic data that capture not only the physical environment, but also human and social environments (e.g. perception, cultural, economic, political). This will facilitate a better understanding of what is happening at a local level, with regional level influences, as illustrated by the recent swift global distribution of the novel coronavirus ((SARS-CoV-2) also known as COVID-19) (17) (Figure 2).
Approaches to disease mapping and spatial epidemiology range in complexity from the creation of simple maps (e.g. John Snow's Cholera map of 1854 (5)), graduated points ( Figure 2) to deterministic, correlative, geostatistical and geocomputational modelling techniques as summarized in Table 1. For examples (see (18)(19)(20); and malaria maps using different methods that include: Suitability analysis (21); Bayesian geostatistical methods ((22); Geocomputational methods with host-pathogen-environment models (23)). Geocomputation allows for flexible, spatial simulation, but can be computationally intensive. Spatial Regression Standard statistical regression models are not appropriate for analyzing spatially dependent data. Instead, several spatially regression methods have been developed.
• Spatial autoregressive models. Simultaneous autoregressive (SAR) models are frequentist approaches designed to address spatial autocorrelation. They incorporate spatial autocorrelation using neighborhood matrices that specify relationships between neighboring data points. • Bayesian regression models. Bayesian regression models provide an alternative to SAR models. can be used to estimate the effects of potential risk factors related to a disease by including fixed covariates along with the random effects. • Geographically Weighted Regression (GWR), models spatially varying relationships using a local linear regression model. Decision Support Systems • Can exploit multiple technologies (geographical information systems, statistical and mathematical models, decision-support modules), multiple data sources and permit widespread dissemination of epidemiological data. • Spatial simulation; geocomputation However, with big data analytics (34,35), GeoAI (36) and increased access to geographic data, much more can be done with existing surveillance data. For food-, water-and air-borne infections, residential addresses and zip codes of people reporting symptoms and pathogens; stratified by age and sex, can be mapped in space and time to examine incidences of infections within precise geographic areas (e.g. tuberculosis in South Africa (37, 38) and for targeted responses (e.g. vaccine deployment for cholera in KolKata (39))). Geographic cluster detection even for infections with person to person spread, such as sexually transmitted and blood-borne are meaningful, as studies have demonstrated surprisingly dense clustering of street involved people who sell sex (e.g. (40)).
To accomplish these different tasks, public health epidemiologists require sufficient training in concepts of geography and a variety of methodologies and techniques (e.g. (41-45)) including spatial analytical (28) and web-mapping methods, which are still largely absent from many educational curricula, with only brief mentions of these methods and tools (41-47). Although there has been an increase in the inclusion of data science in the health sciences (e.g. (48)), spatial analysis and the ability to examine disease incidences within geographic contexts is still largely missing (49) as highlighted in the recent article (48) on data science for public health that does not include any reference to spatial data science. This is hindering the ability to incorporate crucial, process-based understandings of health events within the context of different geographies which may influence disease outcomes (49). Geographies may include population (e.g. density, lifestyle, demographic characteristics); physical environment (e.g. land use, climate [temperature, wind, precipitation], topography, water bodies, soil type); mobility (e.g., transportation nodes, infrastructure); health facilities (e.g. location, type, availability, and accessibility) or human and social geographies such as boundaries, places of interest, social venues, cultural locations, and activity spaces. Integrating these with disease analyses will enhance public health planning and intervention (28,49).

Methodology
As technologies continue to evolve and different geographic data becomes available, how can we better incorporate these into a process that can help public health practitioners evaluate disease and health risks both in the short and long term? Essentially, how do we train epidemiologists in geography and geospatial technologies and methods? To address this, we have centred our evaluation around a public health response cycle that encompasses several steps important for investigating, evaluating, and managing disease incidence and outbreaks, as described in (42,(50)(51)(52) and summarized in Table 2a-c from a number of different reviews. We further demonstrate how different spatial and mapping methods and analyses may be used by providing several case studies that range from local outbreaks to a global pandemic. These include the John Snow Cholera outbreak of 1854, the Ebola outbreak of 2014 in West Africa and the ongoing global COVID-19 pandemic that started in 2020.

Ecology of Disease -Detecting an outbreak or health event through surveillance:
The initial stage of the cycle consists of detection where ideally, an outbreak or health event is discovered through consistent monitoring, and an unexpectedly high number of people in a small geographic location (e.g. one city or hospital) are diagnosed with it. Surveillance is defined as the collection, compilation and analysis of health conditions which includes dissemination of information to those who need to know, including health care staff and policy makers (53). Mandated by law for many infectious diseases, demographic, locating, laboratory and clinical data on people who have the condition (known as cases) are collected by health care and laboratory professionals who notify local, national and international (e.g. WHO (54)) public health agencies (55). Criteria for what constitutes a case of the disease under surveillance are published by state, provincial or federal, or international authorities and usually include a positive laboratory test for the pathogen and signs and symptoms consistent with infection. As soon as the number of cases rises above the epidemic threshold, based on past mean rates and standard deviations, a potential outbreak exists, which is verified after a preliminary check for issues such as possible laboratory or data entry errors. Many surveillance systems, particularly for infectious diseases, contain minimum data to describe the affected people by person, place, and time. Age and sex of infected cases is tabulated and graphed, together with their residential addresses; dates of; onset, presentation at a clinic, specimen collected, and results reported to the public health department (e.g. DONs (1)).

Developing an understanding of the ecology of a disease.
These data, coupled with laboratory results on the pathogen identified ,are usually sufficient to form sound hypotheses as to source and exposure (50). Through the inclusion of geography, they allow for geographic visualizations and spatial analyses to be performed in GIS (Geographic Information Systems) and other such software packages. Through these methods and other case data, public health staff are able to identify clusters that highlight outliers or hotspots, examine interactions and relationships through the integration of different types of data (environment, host, pathogen) as well as compare cases with the rest of the population stratified by different attributes such as geography, time, symptoms, age, or sex.

Response -prevention planning and implementation of interventions to minimize risk, enable for recovery and treatment:
Once we understand the ecology of the disease, the next stage of an outbreak or health event is to develop a response that includes implementing prevention measures that range from educating the public and health officials, to infrastructure needs such as providing sanitation, developing new vaccinations or the placement of new health facilities. In the last stage of the response cycle, surveillance for all pathogens of public health importance continues after prevention measures have been taken, to ensure that no new cases arise and to detect new outbreaks (Table 2b).

Communicationinforming the public
During each of these stages, communication strategies are important to ensure up-to-date information is provided (Table 2b). This can take many different forms ranging from published documents (1, 2) to interactive web maps (56, 57) that are updated in real-time (e.g. COVID-19 Dashboard provided by WHO (58); Johns Hopkins (59)) or at other time intervals (e.g. weekly (60) or adhoc (e.g. CDC Travel Recommendation Map (61)) depending on needs.

Case Studies
To demonstrate how different spatial and mapping analyses may be incorporated at each of the different steps of this framework, we provide several examples ranging from local outbreaks to a global pandemic. These include the John Snow cholera outbreak of 1854, the Ebola outbreak of 2014 in West Africa and the global COVID-19 Coronavirus pandemic of 2019-ongoing.

Software and Data Availability Sub-Section
All data used during each of these analyses are available in the public domain and are listed in Table 3. All analyses were completed in ArcGIS and Excel.

Ecology of the Disease -Determine sources of infection:
Originally found in bats, Ebola may contaminate fruit and places where children play, then transmits person to person by direct contact through broken skin; mucous membrane body fluids, contact with contaminated items, clothes, bedding, and medical equipment, infected bats, non-human primates, and sex with an infected person. Ebola is new in West Africa where populations are more urban (6).

1.
Visualize and examine outbreak cases: Map the location of Ebola cases over time to assess the spatial distribution of cases and spread of disease.

2.
Collect more data: Collect detailed information of cases, where and when they occurred and of their contacts through contact tracing. On Jan 24, the head of the health post in Meliandou informed public health about 5 people with diarrhoea who died; the disease appeared similar to cholera, so nothing was done. Then MSF investigated again on Jan 27th, and also indicated cholera. The Guinea Ministry of health issued an alert March 13; WHO Africa investigated 14 -25 March and found cases in three different places linked to the largest city with health care closest to Meliandou (6) 3.
Identify source of infection: single vs multiple sources of infection. There does not seem to have been any. Response: Investigations into contacts and cases, safe burial for those who died (mandatory cremation); quarantine those affected to a crowded slum of 75,000 people; closure of markets; restriction of movement of patients and contacts, and curfews (12) Findings: weak health systems, undetected cases migrated to Sierra Leone and Liberia; crowding of cases (12) Continued surveillance: To ensure no new cases.  65)) and (C) shows the changing areas of risk using the cluster and outlier analysis (Anselin Local Moran's I) with spatial relationship defined as contiguity (edges and corners). (C) Shows the same information in B but highlights clustering (e.g. high-high: high incidence rates surrounded by high incidence rates; low-low: low incidence rates surrounded by low incidence rates; highlow and low-high: dissimilar areas or outliers where there are areas of high incidence rates surrounded by areas of low incidence rates and vice-versa). Analysis for (C) were performed in ESRI ArcGIS 10.8.

Discussion
By their very nature, the geospatial sciences are interdisciplinary, central to everything we do, and to everything with which we interact. Maps and geospatial technologies have been useful for showing where disease outbreaks may be taking place; identifying potential sources of infection and determining who may be affected when and where. However, the steep learning curve associated with using many GIS packages has resulted in its slow uptake in many fields (70). As we enter the digital (data) revolution and the age of web mapping (70); it will become critical to develop ways that integrate these methods and data so Description of outbreak: Unusual pneumonia was detected in 27 people in Wuhan, China, most of whom were vendors at a seafood and wildlife market as of Jan 2 (3).

Ecology of the Disease -Determine sources of infection: 1.
Visualize and examine outbreak cases: Map the location of all infected cases to determine what relationships exist with each other and the environment in which they are interacting. Examine how close the cases are to each other. Determine if more of these cases are clustered together than expected by chance, given random placement, allowing for sex and age. Identify overlapping activity spaces and common "hang out" locations. Add context by mapping where the infected are in relation to other places in the area frequented by those that are ill. Identify common features within the area of interest (e.g. food sources, markets).

2.
Collect more data: 121 contacts being observed by physicians, Jan 3 (7). Conduct in depth interviews with those that are ill and those that are well to obtain further information on all possible hypothetical exposure locations to the pathogen. Obtain detailed data on symptoms, clinic visits and hospitalisations; places visited just before each person became ill (e.g. restaurants, parties, day trips, markets) along with interactions with animals and where these took place.

3.
Identify source of infection: single vs multiple sources of infection: From the in-depth interviews/questionnaires and maps, identify additional potential sources where respiratory disease may have been acquired. Source identified as a coronavirus (10). a. Hypothesis: Transmission by person to person is most likely given the number of cases in Japan South Korea and the number of confirmed health care workers that are infected. b.
Hypothesis: Mode of transmission is by droplet, and/or contaminated surfaces. Response: Close the market in Wuhan. Implement socialdistancing measures; temperature checks on travellers into Hong Kong (13); create technological apps to monitor the situation; develop and roll-out a vaccine to reduce infections. Findings: Ongoing. From the time the market closed to the isolation of infectious people and the implementation of social distancing, it reportedly took 5 weeks for no new locally transmitted cases to emerge. Since then, monitoring has continued with various closures and lockdowns to manage cases locally and at a country level. Continued surveillance: To ensure no new cases. Surveillance is ongoing as variants emerge. Surveillance is ongoing of vaccinations rollouts and coverage. as to enhance communication efforts (71), sharing of sensitive data (see (72,73)) and analytical capabilities. Examples of these include better integration of geographic analysis with other types of data such as phylogenetic data (74) (75); clustering methods (76) and forecasting in real-time (77) at all stages of public health surveillance, planning and response. This has been highlighted by the many analyses, maps and interactive dashboards that have been created during the COVID-19 pandemic (78, 79); including identifying hotspots (80), modelling risk (81) (82) and spread (83) as well as integrating environmental data to examine factors influencing COVID-19 (84) and the need for demographic characteristics (85) to better assess who may be at risk when.
As we move forward, we need to develop new methods and integrate Geography, GIScience and Spatial Data Science into the core curriculum of public health to provide a unified approach across space and time so that we can improve how we monitor and manage health and well-being and are better prepared for the next outbreak.