Creation of a Model for Estimating the Home-return Rate of Evacuees Using Mobile Phone Movement Histories and Its Application to the Nankai Trough Earthquake

Considering the 2011 Great East Japan Earthquake, by utilizing GPS based large-scale people flow data, we developed a home-return model considering city variables that can estimate the rate of people who will have returned home on any number of days after an earthquake tsunami disaster. We obtained high accuracy with the sparse logit model in this study. The model can be applied in estimating a disaster only by using grid-based city variables of GIS data and existing damage estimation models. In addition, we used the model in the case of the Nankai Trough megathrust earthquake and simulated the transition of postdisaster home-return ratio. The estimation result can help local governments plan the management of evacuation centers in terms of the management of supplies and goods for disasters. The study could help a new understanding of the quantitative relationship between people returning home after evacuation and city variables with regard to earthquake and tsunami hazards based on spatial infor-


Introduction
Various major earthquakes, including the Niigataken Chuetsu-oki Earthquake in 2007, the 2011 Great East Japan Earthquake caused damage to building structures and urban infrastructures, forcing a large number of people to evacuate [11,36]. Poor hygienic environments and stress from living in a shelter for a long period of time posed a health risk to the evacuees, making the secondary damage immense. Proactive measures for evacuees against great earthquakes are not sufficient. Therefore, in future large-scale earthquakes, it will be significant to take measures for evacuees from immediately after the disaster strikes until the time of reconstruction. In particular, the prediction of how people move for evacuation during a disaster scenario is important in reconstruction plans such as in calculating the amount of relief goods that need to be delivered to shelters and the scales of shelters or in considering what kind of remedies are necessary for long-term evacuees. In many of the methods traditionally used, the number of evacuees in a future disaster is predicted using questionnaire surveys carried out after past disasters. Japanese government uses variables such as the rate of building collapses and of lifeline damages, to estimate the number of evacuees with a model used for projecting the number of evacuees [32]. However, this type of model cannot be used for understanding how the number of evacuees will change over time. It is difficult to use this model in a reconstruction plan that involves a time axis. Indeed, following the Tohoku Earthquake (2011), an accurate number of evacuees could not be estimated. This induced delay in inefficient distribution of emergency goods [11].
On the other hand, large-scale GPS trajectories collected from mobile phones and smart phones are now gradually being used for understanding how individuals move [3,4,6,8,9,31]. These types of data are used in various areas in spatial information science such as traffic management [7,10,12,13], analysis of tourist mobility [19], and prediction of population distribution and dynamics [14,21,24]. From the perspective of understanding how people plan to evacuate in times of a large-scale disaster, a recent study analyzed the details of the phone calls to investigate the predictability of the movement of people who were recently evacuated following the earthquake in Haiti, and indicated that most of the places people evacuated to were the places people had previously visited [16,30,36]. Furthermore, through the use of mobile phone data, a decrease in population was observed in various areas affected by the Great East Japan Earthquake in 2011 [1,27,28,29,30]. There are other studies that used Twitter data collected following both the Nepal Earthquake and the Kumamoto Earthquake to understand the tweeting activities of the disaster victims and the changes in their emotions after a large-scale disaster struck [17,33,34]. It has thus been demonstrated that evacuation activities following an earthquake can be monitored using mobile phone GPS data. These studies examine the movement patterns of people (places they evacuate to and distances they move for evacuation), but there have been no studies that have built a model that covers until the time the evacuees return home by combining their individual actions and the extent of damage in a chronological order. The reason is that the home return of evacuees can involve multiple factors such as damage to building structures caused by earthquakes, damage caused by tsunami, damage to lifelines, and the attributes of the evacuees.
There have likewise been studies that simulate how people begin to evacuate after a disaster strikes, but many of these focused on the estimation for the period right after the disaster struck. A chronological estimation that likewise covers the recovery period is limited to a simple estimation method at this point [1,20,25]. It is, therefore, necessary to build an estimation model for revealing what factors influence the recovery period of evacuees and for determining how long it takes for an evacuee to be liberated from post-disaster life as such.
Against this backdrop, this study aims to build a model for estimating when evacuees return home by considering multiple factors that emerge right after a disaster strikes until they return home, using macro data focused on individual building structures and persons collected during the aftermath of the 2011 Great East Japan Earthquake. Therefore, in order to build a model, we first carried out a long-term analysis using people flow data collected from mobile phone GPS data continually monitored before and after the Great East Japan Earthquake and data on the extent of damage to each building in disaster-affected areas. We thereafter revealed what factors influenced people when they return home. Then, the model built shall be applied to an earthquake in the Nankai Trough, predicted to occur in the future, to estimate the home-return rate of each day after the disaster in the Kochi and Tosa Cities in Kochi Prefecture, in order to reveal the regional characteristics as to when people return home per area. In this study, if critical variables are selectable after identifying potential structures in the estimation of home return after a disaster, then resilience evaluation based upon urban vulnerabilities is possible.
In this paper, we use a sparse modeling (SpM) to estimate home return from precise data on urban environment and present the results to understand the potential structure of home return and urban system variables.

Data analysis
In this study, we used the case of the Great East Japan Earthquake in 2011 and built a model for estimating the home-return rate each day following the tsunami disaster, using SpM logit model with high spatial GIS data. The areas covered were six prefectures in the Tohoku area, namely the following: Aomori Prefecture, Iwate Prefecture, Miyagi Prefecture, Akita Prefecture, Yamagata Prefecture, and Fukushima Prefecture. In the creation of the model, we considered both property damage, including damage to buildings and lifelines caused by tsunami, and qualitative attributes, such as the attributes of people. Table 1 shows the list of variables used to create the model and the data sources used. Also, because of the variety in data types, such as disaggregated data and mesh-unit data, we processed the data by using a 1 km grid as the unit for aggregate calculation in the creation of the model. Because the data on the extent of water supply damage is limited to those from the second day after the disaster onward, only such data was used in this study.

Mobile phone location data
For the estimation of the home-return rate of people following the Tohoku Earthquake, we used data from 2011 mobile phone GPS logs called "Konzatsu-Tokei (R)" provided by ZENRIN DataCom Co., LTD. "Konzatsu-Tokei (R)" Data refers to people flows data collected by individual location data sent from mobile phone under users' consent, through Applications provided by NTT DOCOMO, INC. Those data are processed collectively and statistically in order to conceal the private information. Original location data is GPS data (latitude, longitude) sent in about every minimum period of 5 minutes and does not include the information to specify individual. This is a large database constructed of text data from approximately nine billion records belonging to about 1.5 million users throughout Japan. In this research, we used data covering a one-month period from March 11 to April 7, 2011. The target sample included data from approximately 30,000 people. The data processing method devised in this research was applied and carried out to GPS data by NTT DOCOMO, INC.

Estimation of home return rate
Next, we described the procedure for estimating the home-return rate from GPS data by using past study method [22]. From the GPS data, we extracted the staying points from 0 to 4 o'clock. The position coordinates of the homes of evacuees for each date were the centers of gravity at 0 to 4 o'clock taken as nighttime representative points.
Then, the coordinates of the estimated residential area for each individual ID and the coordinates of the nighttime representative point of each date were compared with each other. When the coordinates were within 100 m of the estimated residential coordinates, this was regarded as returning home, and the first date this occurred was taken as the home-return date. Finally, the home-return rates of the evacuees for each date were aggregated into 1 km grid units. Figure 1 shows the results of the estimation of the home-return rate at the Great East Japan Earthquake in 2011. On March 13, the home-return rate was low in many grids on the Pacific side. On March 14, the third day after the disaster, there was a great increase in home-return rate between the areas on the Pacific side and the inland areas, suggesting a strong effect whether people were affected by tsunami. As the days went by, on March 20, the ninth day after the disaster, and March 30, nineteenth day after the disaster, we can see that 60 to 80% of the people returned home in the coastal areas struck by the tsunami. On the other hand, areas surrounding the nuclear power plants were still no-entry areas on March 30. Therefore, the home-return rate remained low at 80%+.

Damage data
We acquired post-tsunami survey data from the Ministry of Land, Infrastructure, and Transport, based on land surveys, including all buildings (approximately 220,000) in the inundation area [2]. The data cover the damage situation and flood depth for each building in the affected area. For the inundation-tsunami depth in areas, we used mean and maximum inundation depth per grid. Data of population of the tsunami inundation area, the 2010 population census data (1 km grid) was used therewith. The rate of residents living in a tsunami-affected area was calculated by dividing the population within the flood grid by the total population of each 1 km grid. To measure the estimated home-return rate for each damage category, we assigned the damage situation of the building in the estimated home for each individual ID of the GPS data and calculated the population rate in partially destroyed and collapsed buildings.

Infrastructure data
We used data on water supply cutoff conditions in each municipality released by the Ministry of Health, Labour, and Welfare [15]. Since data on March 11 and 12 did not exist among published data, the data after March 13, the second day after the disaster, was used.

Attribute of population
Assuming that a difference exists in the status of home return according to attributes (e.g. age and household composition) of evacuees, the data regarding the number of households, households by age, and population in 2010 were used for this study. The proportion of private households with a member aged 0-5, the proportion of private households with a member aged 65 and over, the proportion of the population aged 0-5, and the proportion of the population aged 65 and over were calculated.

Target area
Twelve municipalities were the evacuation areas near the Fukushima No. 1 Nuclear Plant and, therefore, were excluded since the evacuation orders by the municipalities tended to affect when people were allowed to return home to those 12 municipalities after the disaster. In areas with an extremely low number of GPS sample users by 1 km grid, we found that the data of certain sample users may have a strong impact on the estimation results, thereby affecting the entire estimation. Thus, the areas where the number of sample users was four or less were excluded from this study.

SpM logit model
In this study, we created a model via SpM, using the logistic curve as the link function for the explanatory variable x, to describe the home-return rate Pi in girds i of each day after the disaster in probability according to 0 ≤ ≤ 1 (Equation 1): In terms of data compiled in Section 2, it is necessary to conduct modeling (compressed sensing) by compressing the data dimensions (selecting the feature amounts), in order to maintain the prediction accuracy of the unknown data and to avoid multicollinearity and overfeeding. In this study, SpM, which is applied to extract the feature amounts in deep learning, is used for compressed sensing. In using SpM, the least absolute shrinkage and the selection operator (LASSO) were used. In LASSO, ax represents the parameter that is to be estimated and represents the probability of the data analyzed. Just like in equation (2) Here, λ is a sparse controlled parameter and λ = 0. Equation (2) corresponds to the normal log likelihood. In L1 regularization of LASSO, we can obtain a sparse solution where feature structures of the data are appropriately extracted, by selecting λ according to the appropriate norm and pruning variables. In other words, selection of variables takes place together with modeling, and the number of variables decreases by making the penalty parameter λ large. The variable pattern with the least margin of error in this process will be used.

Model result
Figurer 2 shows the results of the modeling via SpM. Figure 2 (a), the parameter λ for controlling the sparsity is plotted on the horizontal axis, and the number of non-zero features and the change in RMSE of the cross variation based on the obtained SpM are shown. As λ decreases, the number of nonzero features increases and RMSE decreases in inverse proportion to it. We chose the model of λ with minimum RMSE. Figure 2 (c) shows the parameters of the intercept α and the explanatory variable x of the model developed. We can estimate the home-return rate of each day after the disaster by applying these parameters to equation (2). Based on the SpM parameters, we can see that the variables selected only 9 variables and related to damage, such as home damage (half destroyed and fully destroyed) and inundation depth by tsunami, have negative effects on home return. In terms of infrastructures, it was discovered that water supply restoration has a positive impact on home return. On the other hand, we can see, in terms of the attributes of the residents, that the proportion of children aged 14 and under had a positive impact on the home return rate. This suggests that privacy conditions or the environment at the shelters were fairly poor. Figure 2 (b) shows the comparison between the estimated values obtained from the developed model vis-à-vis the true values based on deviation rate. As a result of validation, a strong correlation was obtained with a correlation coefficient at 0.81. Rootmean-square-error (RMSE) by cross-validation was 0.037, indicating the accuracy with a margin of error of 5% or lower. Also, deviation rate of 80 % confidence interval in each day are checked without day 2. The 95% confidence interval is 0.59<x<0.83, suggesting that a lower estimated home-return rate means a lower accuracy. These results indicate that there is a variability in estimation right after the disaster when the homereturn rate was low, whereas the estimation is highly accurate when the home return rate is relatively high (when the extent of damage is small or when many days have passed since the disaster).

Target area
In this study, a home return model was constructed and applied to the Nankai Trough earthquake that is expected to occur. The home return rate was estimated for each day after the disaster. The target area is the Kochi prefecture in the Shikoku region, Japan where considerable effects of the Nankai Trough earthquake have been predicted.

4.2
Data for damage estimation

Seismic intensity and tsunami inundation data.
In this study, we have assumed that seismic intensity data uses "the probabilistic seismic intensity prediction map." "The probabilistic seismic intensity prediction map" manages the relation information of "strength," "period," and "probability" of seismic intensity based on the location, scale, and probability of all earthquakes that occur in and around Japan. The calculation is based on the degree of probability and the extent to which the applicable area shakes, and the distribution is shown on the map. This data is provided by National Research Institute for Earth Science and Disaster Resilience (NIED) and can be downloaded from the Internet [18]. The maximum seismic intensity was calculated for each 1-km mesh, so that the data could be unified with other usage data. In this study, we assumed the occurrence of the Nankai Trough earthquake and used data with an excess probability of 2% for 50 years in "the probabilistic seismic intensity prediction map" (Fig.3 (a)). The tsunami run-up data used to achieve the simulation results were considered assuming the occurrence of the Nankai Trough earthquake and were provided by Tohoku University (Fig. 3 (b)). Table 2 summarizes the data. This data was in a 10-m grid unit for 180 min (30 s interval over 360 scenes) after the disaster. Additionally, we calculated the ratio of the estimated tsunami inundation depth at each stage for a 1-km grid using the maximum inundation depth data. Furthermore, the ratio of the population in the tsunami-inundated area to the resident data of the building micro-geo data was calculated.

Population distribution data.
The data of a 1-km grid unit from the National census data (2015) were used for population distribution in this study. In addition, an area, where the data was aggregated for each grid, was considered, because of the confidential processing, and these meshes were processed by applying the same numerical value as the aggregated grid. Fig. 3. Spatial distribution of the estimated seismic intensity and tsunami inundation depth at the Nankai trough earthquake. a Seismic intensity data for each 1-km grid unit Estimated tsunami inundation depth for each 1-km grid unit.

Estimation of building damage.
The damage estimation for each building as a result of the assumed Nankai Trough earthquake was based on the estimation method used by many local governments and was aggregated to the 1-km grid unit. Additionally, the damage to each building was stochastically calculated, including tsunami, fire, and earthquake collapse [22]. The damage estimation model uses the damage fragility curve model of Yamaguchi and Yamazaki [35] for calculating building damage due to seismic intensity and the fragility curve model of Suppasri et al. [26] for calculating building damage of the tsunami model. Moreover, in this study, probabilistic calculations were made in a worst-case scenario, i.e., the assumed occurrence of the earthquake at night, when home residents would be considerably affected by building damage.

Estimating water supply recovery rate.
The water supply restoration rate was estimated based on the model proposed by Nojima and Kato [20]. The model can estimate the restoration rate of the water supply on each day after a disaster by inputting the measured seismic intensity. We estimated the restoration rate of the water supply after the Nankai Trough earthquake hits by inputting the seismic intensity in the model described above. Figure 4 shows the estimated home return rate for each 1-km grid. The general tendency observed was that the home return time was late in the tsunami-flooded area and the area affected by the earthquake seismic intensity. Figure 5 shows the average estimated home return rate for each municipality. The home return rate was particularly slow in the five areas comprising Kochi City, Nankoku City, Tosa City, Susaki City, and Konan City that were affected by the tsunami.

Non-tsunami area.
When we investigated the reasons for the slow return to home areas scattered in inland areas such as Ino and Niyodogawa, we identified that the percentage of residents living in partially destroyed buildings and the percentage of residents living in completely destroyed buildings were large. In addition, the areas where the home return rate was slow, especially Kochi City, comprised regions where a large proportion of residents used to live in building that were completely destroyed. In other words, in areas that were not affected by the tsunami, the impact of half-destroyed buildings and completely destroyed building considerably influenced the home return rate of the residents in those areas.

Tsunami area.
Among the areas affected by the tsunami, Konan City recovered faster than Kochi City and Tosa City. This attributed to the shallow inundation depth of the tsunami in Konan City; however, it is less than those of the tsunamis in Kochi City and Tosa City ( Figure  6). In other words, in areas affected by the tsunami, the effect of the tsunami on the home return rate is considerable and the home return rate immediately after the disaster is low. However, recovery was observed to be fast.
By using this result, we could support decision making of local governments by providing predictions of how many people will evacuate and how many people retune to home in each day prior to the disaster.

Conclusion
In this study, using the damage situation of the Great East Japan Earthquake, we constructed a model to identify the home return rate for each day after the disaster and estimate the time taken to return home. The proposed model shows an accuracy error of approximately 10%. In addition, we linked the situation of individual home damage using large-scale human flow data and examined the difference between home return status and home damage status.
Moreover, we applied the model that estimates the home return rate to the damage estimation results, assuming the occurrence of the Nankai Trough earthquake, and predicted the home return status in each region in a time series. By using this result to understand the evacuation status of evacuees in a time series, taking measures such as estimating the amount of material needed for evacuation shelters, examining the size of evacuation shelters, and arranging human assistance was possible. By using this model, long-term home homes can be expected to be developed by applying real-time actual damage observation data immediately after an actual disaster and not just on the basis of advance estimation. Additionally, it can also be used to predict the home return status.
This study has some limitations and we need to work toward certain goals in the future. First, in this study, some important parameters were assumed. This issue can be solved be considering extra data. Some examples of the assumptions are as follows. In this study, lifeline damage considered only water and sewage systems; however, it depends on various factors such as electricity, gas destroyed road infrastructure. For this reason, considering the damage caused by power outage is necessary, so that the return of evacuees can be estimated more accurately. Second, the reconstruction policies and disaster prevention measures differed depending on the region; therefore, developing a method to incorporate these effects into the model is necessary. Third, must evaluate the reliability of the model by applying it to the Kumamoto earthquake, which occurred in 2017, and other earthquakes that occurred in the past.