Extraction of The Spatio-temporal Activity Patterns Using Laser-scanner Trajectory Data

A pedestrian tracking system on highly accurate laser scanners is an effective method to understand the usage of the facility space. While this system is capable of gathering an enormous volume of tracking data, specialized skills and significant amounts of labor are needed to get a reliable bird’seye view of the spatio-temporal characteristics of the observed data. In this paper, two methods to extract patterns of spatio-temporal activity are described. These can provide a broad overview of the office-worker’s activities in the office throughout a workday and an easily under-stood visualization that indicates what time segment, what location and what activities are taking place. One is a time segment extraction model that identifies characteristic time intervals in the time series data of office-worker’s activities using a classification model based on information loss minimization model. The other is a day scene extraction model that identifies daily scenes from simultaneous behavior patterns in spatio-temporal distributions using a latent class model with PLSI (Probabilistic latent semantic indexing). These methods provide viewpoints for separating their activities of a workday into time segments of appropriate size in order to obtain a grasp of how the activities vary with the time of day. Simultaneous behavior patterns in time, space, and activity are extracted, thereby allowing representation of typical scenes such as morning meetings and extended conversations between co-workers.

In this study, we focus on the three factors representing information about activities: temporal information (timed data distribution), spatial information (spatially meshed data distribution), and walking speed representing "Staying/Moving" activity information. We then propose a computational model for extracting characteristic spatio-temporal activity patterns that classifies the data into appropriately sized groups as data distributions with similarities appear. A method for representing the above information about activities is then investigated.
The remainder of this paper is organized as follows. In Section 2, we describe a model that extracts the time segments of particularly high activity during the day in an office (time segment extraction model, Fig. 2). By reducing the number of divisions of the tabulated data and thereby compacting the information, the time segments when workers gather to discuss matters or for other purposes are extracted, thus easing the interpretation of the time segments and the selection of representative diagrams. In Section 3, we describe a model representing the extraction of "scenes" in the office, assuming the co-occurrence of temporal and spatial information (day scene extraction model, Fig. 3). Next, a procedure for capturing concepts from the co-occurrence of documents and words used in the natural language processing field is applied to extract scenes as latent classes from the co-occurrence of temporal information, spatial information, and velocity information. The characteristic spatiotemporal patterns are also represented.

Previous research and the theme of this study
Matsushita et al. [2] conducted a questionnaire survey of chronological patterns in office activities and analyzed the continuity and patterns of office worker behavior that appeared in chronological order. As a result, workers were categorized in terms of what time and what activities they would do. Although the motivation to classify office activities using time-series information was common, our study used more specific trajectory data to categorize time segments in office against ambiguous location information stored in a questionnaire.
Sano and Watanabe [3] proposed a method for expressing spatio-temporal relationships in the trajectory of crowd pedestrians. This was an attempt to express the temporal elements of walking behavior for crowd walking, which was difficult to understand on a x-y display. The motivations for expressing temporally and spatially attribute were common, but the concept of the walking characteristics, timescales and time segment were different. While the previous paper dealed with the characteristics of crowd walking, especially the flow in a short time, we expressed the movement and staying of all-day activities in the activity characteristics of office workers.
Nathan and Pentland [4] proposed a method of identifying people's daily eigenbehaviors to extract a time series pattern inherent in daily activities by collecting longterm stable life logs with mobile phone. Although motivations for expressing temporally, do you mean spatially and socially attribute were common, their location data labeled with "Work / Home / Else / No signal / Off" was so simple and the granularity of location information was significantly different from our study. Similarly, Sato et al. [5] proposed the behavioral model of office workers using Indoor infrared position data labeled with "PC operation / Meeting / Else" to extract worker behavioral patterns from long-term behavioral time series data of workers. While these studies used mobile sensors for collecting personal location data and aggregated in individual activity units, our studies used laser scanning sensors for indoor positioning and tracking of unspecified majority people. There are such differences in measurement methods. In addition, the pedestrian tracking system using laser scanners have many operational advantages compared to other indoor position tracking system based Bluetooth etc., such as extremely high accuracy and the absence of the need for individual sensors, thereby maintaining privacy.

Overview of time segment extraction model
When the time series data for the group of points representing measurements in the area of interest are represented as office activity levels, the format is easier to interpret after the data have been somewhat summarized in terms of time period than as a series of detailed bar graphs. While the summarization process provides a simple and convenient way to divide a length of time into one-hour segments, it is not necessary to employ time segments of equal length in order to represent the situations and activities in an office. Instead, one can expect higher explanatory power when segment lengths are adjusted to fit the activity levels. However, this requires the development of a bottom-up procedure for binning continuous quantities such as histograms.
The Chimerge method [6] is a discretizing algorithm for continuous variables in which binning is determined using repeated chi-squared testing of the attribute distributions in neighboring sectors based on information about the classes to which the data belong. Unfortunately, it is difficult to apply this method to time series data of a single attribute because binning is determined by discontinuities in the component ratio. Therefore, for this study, from the viewpoint of data quantity statistics, we conceived a classification method in which as much of the original data (narrow segments) as possible are saved (determination of the class boundary values).
Generally, compression of a finely aggregated data group for display means that some of the original data are lost. Osaragi [7] realized that minimizing the losses of this information would reduce misjudgments and proposed the method of comparing the mean data quantity when distinguished by the smallest unit (I0) with the mean data quantity under a new classification (I) in order to obtain an equation for the data loss rate and determining the class boundary values so as to minimize this rate. Here, a classification model is conceived in which that segmentation method is employed on class number M histograms tabulated in units of minimum time length, and these are compressed to the number of classes, N, desired by the analyst (M > N).

Fig. 4. Example of analysis notation of pedestrian trajectory in the office
Let us make some observations using the measured data in Fig. 1. Figure 4 is a plot of the measured values in which the horizontal axis is time and the vertical axis is walking speed. It is difficult to identify any trends by examining the actual data, but if they are re-represented in contour form using kernel density estimation, one can see that speeds can be broadly divided into two distributions, one centered around 1.2 m/s and the other centered around ≤0.3 m/s. If we understand the former group as representing "moving" while in the office, and the latter group as representing "staying" while workers are standing and speaking to each other, the biases of these distributions may be of use for explaining the characteristics of office activities. It is also possible to perform simple classification model analysis of only the data distribution in the temporal information, but in that case, the distribution model reflecting the activity characteristics of staying vs. moving (i.e., walking speed distribution) has greater explanatory power. Therefore, we classify using "walking speed× time" cross-tabulation.
Classification is performed by combining rows (walking speed) and columns (time) in a cross-tabulation table and then recreating the table in a more compact form. The boundary values for the histogram are set before modifying the classification. The histogram is obtained after the modification by calculating the mean values, dividing the total number of data values in each single class by the width of the class. Here, the method used for recreating the rows and columns is to repeat the task of inducing a lower and lower data quantity loss rate while stochastically varying the selected boundary values up and down.
The actual task consists of the following: ① The minimum data segments (for example, time, 10-minute units; velocity, 100 mm/s units) are set and initial tabulation is conducted at these segments. ② The number of classes in the recreated table is set to the desired value. ③ The initial values at the boundaries of each class are set at random. ④ The boundaries to be varied are selected at random. Additionally, whether the boundaries of the rows or the boundaries of the columns will be varied is also set stochastically. ⑤ The selected boundaries are shifted stochastically to the determined locations and the data quantities are calculated. ⑥ The data quantities are calculated in the search ranges within the boundaries of the preceding and succeeding classes and induced stochastically toward the boundaries in the locations where the data quantity loss rate is the lowest. ⑦ Processes ④ to ⑥ above are repeated until the data quantity loss rate converges to a sufficiently low amount (Fig. 5). Hereinafter, the classes of time distributions obtained as above are called time segments, and the classes of velocity distributions are called velocity segments 7 of 20 AGILE: GIScience Series, 1, 2020. Full paper Proceedings of the 23rd AGILE Conference on Geographic Information Science, 2020. Editors: Panagiotis Partsinevelos, Phaedon Kyriakidis, and Marinos Kavouras This contribution underwent peer review based on a full paper submission. https://doi.org/10.5194/agile-giss-1-9-2020 | © Authors 2020. CC BY 4.0 License.

Visualization representation by time segment division model
Once appropriate time segments have been obtained via the method described in the previous subsection, the time segments are interpreted while examining the characters of the walking speed and time distributions. The character of each time segment is identified while selecting a typical analytical diagram (trajectory map and kernel distribution map, See Fig.1) among the time segments.
Trial and error based on the representation desired by the analyst is a necessary part of determining the selected number of classes. Let us suppose that the analyst wishes to represent the two peaks of immobility and motion. For example, with two classes, the actual calculation yields a data distribution with a bias that prevents representation of the desired distinction between the peaks. Therefore, the minimum number of classes, which will then be used to provide the desired distribution map, must be identified with a search involving various appropriate numbers of classes.

Previous research and theme of this study
Ishigaki et al. [8] carried out simultaneous classifications of customers and products based on the customer lifestyles in order to create a behavior model that accounted for dependency on circumstances. They applied probabilistic latent semantic indexing (PLSI) [9], which is used in the natural language processing field, to large historical datasets describing customer behavior in large discount stores and to customer survey data. Their technique was used in this study to carry out simultaneous classifications on temporal and spatial datasets obtained by laser metrology. To facilitate this, we created a model for extracting "scenes" that express patterns in the usage of time and space in office spaces.
While previous PLSI has addressed the co-occurrence of two variables, documents (D) and words (W), Takamura et al. [10] proposed a model extended to PLSI that can handle co-occurrence of three variables, which is commonly referred to as the 3-PLSI model or the triangular model. Their model was examined for this study in order to establish a model describing latent variables using three-variable cooccurrence data in which velocity information showing the characteristic of motion or immobility had been added to the two variables of temporal and spatial information.

Overview of day scene extraction model
The variables of time d∊D={d1,…dmax}, space w∊W={w1,…wmax}, velocity data v∊V={v1,…vmax}, and the latent variable scene z∊Z={z1,…zmax} are defined here. Time d lies within a time segment corresponding to a 10-minute segment map, space w is in a spatial mesh in a segment map, and velocity v is classified into low-speed, medium-speed, or high-speed motion in the office (see the example in Fig. 1). Figure 6 is a graphical representation of the interdependence of the variables. Note that (a) is a two-variable model excluding velocity information, but since we wish to investigate a three-variable model in this study, we will examine relationships (b) through (d), which show differing dependencies for space w and velocity v. If we view these two variables as occurring independently, this situation then becomes model (b). However, when we subject them to the independence test (chi-squared test), we find that the p-value is quite low, so we must examine them for fitting into model (c) or (d). In this study, we employ model (d), which describes what kind of behavior is likely to occur in a given location. In other words, at time d, we pass through a process generating the following: the probability a given time d is selected, P(d); the conditional probability that scene z occurs at d, P(z|d); the probability that z occurs in space w, P(w|z); and the probability that velocity v (walking speed) occurs at that time and space, P(v|w,z). This can be considered a model of observation of the tabulated data N(d,w,v) and is formulated as Using Bayes' Theorem, we transform Eq. (1) as follows: (2) In order to evaluate the most likely probabilistic distribution of P(v,w,z), we must find the parameters that maximize the following logarithmic likelihood function, which incorporates the tabulated data N(d,w,v): The parameter that maximizes this logarithmic likelihood L can be estimated in an iterative calculation with the expectation-maximization (EM) algorithm, in which a step with the expected value E-STEP is alternated with a maximization step (M-STEP).

[Expectation step (E-STEP)]
The conditional probability of the latent variables indicated by the distribution of the unknown parameters (randomized initial values) can be calculated with the following: On this basis, the expected value of the likelihood of the model is calculated, and the convergence test is conducted.
[Maximization step (M-STEP)] From the Lagrange undetermined multiplier method, the parameters maximizing the expected value for the logarithmic likelihood found in the E-STEP can be found as follows: All the parameters can be estimated by iterating until convergence. Rather than the ordinary EM algorithm, a tempered EM algorithm is used in actual calculations because it incorporates entropic elements that make it easier to avoid errors in the estimates provided by the model resulting from excessive confidence in the posterior probability values of the hidden variables during the calculations. Specifically, parameter β (> 0), the reciprocal of temperature, is introduced to provide correction, as follows:

Visualization representation by day scene extraction model
P(w|z), P(d|z), and P(v|w,z) are obtained as described in the previous section so the spatial distribution corresponding to each scene z is represented by the contour plot of N×P(w|z). In the same way, the temporal distribution is represented by the time series bar graph for N×P(d|z), and the velocity distribution is represented by the component ratio graph for P(v|z) (here, P(v|z)=∑P(v|w,z)P(w|z) and N is the total number of measured data). Last, the day scenes are interpreted while observing these three characteristic distributions.
Since the spatial mesh must be represented in one dimension, it was numbered, beginning from one end. In order to ensure consistency of the data representing spatial information, the data were smoothed using kernel density estimation (kernel width:500 mm). The data were then meshed and tabulated (Fig. 7).

Validation of proposed model using measured data 4.1 Laser-based people tracking systems
In this paper, we use several laser scanners (Hokuyo, utm-30lx sensor) with several PCs for data logging. They are arranged in the office space with sufficient balance considering the reduction of occlusions. A horizontal flat section with a height of approximately 140 cm (almost the chest height of a typical pedestrian) is scanned to extract the workers' locations using a laser point clustering method. As a result, the serial trajectory data, including time, X Y coordinates of position and unique trajectory ID, can be obtained. Since it does not record images, but records these data only, this tracking system is well-suited to apply to facilities that should be private. According to this measurement method, workers sitting below the laser infrared ray cannot be detected, but the model that can estimate and interpolate the occupancy status is proposed in a previous paper [1]. Note that the discussion in this paper is based on only standing state data.

Results of classification and visualization of time segment extraction model
In order to validate the two proposed representation methods, the measured data shown in Fig. 1 were classified and visualized.
The capability of the time segment extraction model was validated using raw data incorporating 2,688 segments, consisting of 84 segments of minimum time units of 10 minutes and 32 segments of minimum velocity units of 100 mm/s. The simultaneous classification method described in Section 2 was carried out to classify the time and walking speed zones.
The first step was to determine the number of segments created after correction of the classification by examining how the classification process varies under the benchmarks. The benchmarks are that the function shall ① be able to distinguish between the two distribution shapes in the velocity zone for immobility and motion, and ② be able to represent the four immobility peaks shown in the time segment in Fig. 4. Figure 9 shows the final classifications for the velocity and time segment distributions and how the data distributions in the "re-binned" segments varied as the numbers of permitted segments were increased. When the number of velocity segments was increased from two, the distribution split into two peaks when six segments were permitted, therefore the final number was six. In the same way, four peaks of immobility were identified once the number of time segments had been increased to 11, and this was selected as the final number.
Proceeding from the above results, Fig. 10a is a graph presenting the combined results of the final classifications in a velocity × time distribution map, and Figs. 10 b and 10c show the boundary values found to converge the calculations. As shown, time segments were well extracted and depicted in the data distributions.
The locations where staying (standing and talking) tends to occur and where movement occurs (main routes) can be extracted from the results of this classification into velocity zones. Mapping the spatial distribution of the velocity zones (αζ) reveals some similarities in the shapes of this and other maps. The correlation coefficients between the spatial distribution maps of the various velocity zones were found and those with high correlations (coefficients ≥0.7) were combined into single zones. The results are shown in Fig. 9d represented by three new velocity zones: Staying (low velocity), 0-300 mm/s (α and β); Moving (medium velocity), 300-900 mm/s (γ); Moving fast (high velocity), ≥900 mm/s (δ-ζ).
Figure11 presents the kernel density map for the spatial distributions of these new velocity zones. The following are clearly visible in these distribution maps: (i) Staying zones, where people stand and talk, in many locations (or for long periods); (ii) Moving zones (medium velocity) in often-used subroutes or "routes of approach" (for example, to copy machines); and (iii) Moving fast zones (high velocity) in main routes for people on their way to large areas.
One can also see other phenomena in these graphs. The routes on the upper side are clearly favored for traveling to the large main areas, whereas there was little traffic near the management's desks on the lower side, and that there was little traffic between the right and left sides, which straddled the group of unused desks in the center of the office.
Let us turn to some observations of the characteristics of behavior at different times of the day. To accomplish this, component ratio relationship diagrams were constructed for motion and immobility for each zone (Fig.10e). The characteristics of behavior were interpreted in each time segment from these graphs and the results are given in Fig.10f. If typical diagrams (Fig.12), which were the distributions (trajectory maps and kernel density maps) for the main time segments, are considered, further observations can be made.
For example, moving is the key activity in time segment #1. This can be interpreted as the time of arrival at work. A closer look at this trajectory map reveals significant attention is paid to the bulletin board (which displayed the work schedule). Time segment #10 can be interpreted as representing motion flows during overtime, since the main feature there was worker immobility. While it is difficult to read any meaning from some of the trajectory maps, through the kernel density maps, it is possible to intuitively grasp that some long-term exchanges were occurring. Thus, integrating the characteristics of the time segment classification results

Results of classification of day scene extraction model
The classification characteristics of a day scene extraction model were validated using 84 classes of minimum time units of 10 minutes at time point D, 2,285 zones in space W of mesh unit size 50 cm × 50 cm and the three velocity zones of low, medium, and high velocities. A cross-tabulation was carried out and the day scene estimation (latent zones) was performed by the method described in Section 3. No clear standard could be set for the number of zones in day scene z), so the 12 zones were calculated by trial and error. To avoid local optima, the cases from among multiple calculation results that provided maximum values for the logarithmic likelihood were designated. Figure 13.a.b.c shows the estimated results for the spatial, time, and velocity distributions corresponding to the day scenes (latent zones). The scenes have been rearranged and numbered in descending order of magnitude of P(Z). We can extract characteristics and interpret the three distributions visible.
[Scene 1] The moving zones (high velocity) velocity distributions are large and occupy wide time segments throughout the workday. Since the spatial distributions represent the main routes and subroutes in the office, they can be interpreted as "moving zones (all day)".
[Scene 2] The staying zones (low velocity) velocity distributions are large and concentrated into approximately 1 hour, around 8:00, in the time distribution. Location A, where they are concentrated in the spatial distribution, is where three employees stood and talked for a long time during the overtime period. Thus, we can interpret this as "Standing and talking #1" (Location A:at overtime hours). [Scene 4] The "Immobility zones (low velocity)" velocity distributions are large and concentrated into a short period around 17:30 in the time distribution. The regular evening standup meeting is held at that time, so many employees gathered in one location, and this can be interpreted as that meeting.
[Scene 5] This strongly resembled Scene 3, with many standings still near the copy machine, so can be interpreted as "Repeated scene #2". ] This is of smaller scale than Scene 4, but standup meetings can be seen in several locations around 9:00, so this is interpreted as depicting the morning assembly. [Scene 7] Two meetings were seen, in the morning and the afternoon, at the same location (Location C). These can be interpreted as "Standing and talking #2 (Location C)".  Here, we will omit the details about Scenes 9 to 12, but they depict scenes of standing and talking. The above results demonstrate that the conditional probability values of latent variables (scenes) can be used to represent the characteristics of temporal, spatial, and 18 of 20 AGILE: GIScience Series, 1, 2020. Full paper Proceedings of the 23rd AGILE Conference on Geographic Information Science, 2020. Editors: Panagiotis Partsinevelos, Phaedon Kyriakidis, and Marinos Kavouras This contribution underwent peer review based on a full paper submission. https://doi.org/10.5194/agile-giss-1-9-2020 | © Authors 2020. CC BY 4.0 License. walking speed (staying /moving) distributions, and these characteristics allow visual extraction of scenes of employees standing and talking, of regular meetings (morning and evening assembly), and other activities.

Summary and Conclusion
A pedestrian tracking system on highly accurate laser scanners is an effective method to understand the usage of the facility space. While this system is capable of gathering an enormous volume of tracking data, specialized skills and significant amounts of labor are needed to get a reliable bird's-eye view of the spatio-temporal characteristics of the observed data. In this paper, two methods to extract patterns of spatio-temporal activity are described. These can provide a broad overview of the office-worker's activities in the office throughout a workday and an easily understood visualization that indicates what time segment, what location and what activities are taking place.
One is a time segment extraction model that identifies characteristic time segments in time series data of office-worker's activities using a classification model on the basis of information loss minimization model. The other is a day scene extraction model that identifies daily scenes from simultaneous behavior patterns in spatio-temporal distributions using a latent class model with PLSI (Probabilistic latent semantic indexing). It was shown that incorporating walking speed information to represent staying (standing and talking) and moving is beneficial for describing office-worker's activities. These methods provide viewpoints for binning a day's activities into time segments of appropriate length and understanding them. They also make it possible to easily identify characteristic typical diagrams out of the large number of analytical diagrams provided. Since they can extract components from large data volumes describing the relationships between time, space, and activity (staying/moving), they can be used to visualize typical scenes such as morning meetings and standing and talking.
The method proposed here can be expanded to examine several days' worth of data. In future research, the authors will develop methods for extracting patterns to obtain more definite findings from long-term data streams.