Using eigen decomposition and sequence-based representation to extract movement patterns from contextualized tracking data

State sequences are a new paradigm to encode and represent contextualised movement data. A state sequence is a temporal succession of characters representing categorical states of the moving entity or its surrounding environment. Eigen decomposition, a principal components analysis method, is an option to reduce and find patterns in such multi-dimensional categorical data through dimensionality reduction. Recurrent patterns can be found by identifying the most relevant eigenbehaviours, which are a set of vectors that characterize the variation in the behaviour of an entity during a time period. Dimensionality reduction techniques have so far not been widely used in movement analytics and in this paper we demonstrate how they could help analyse responses of a moving entity to the dynamic environmental conditions. Specifically, we use sequence-based representation and eigen decomposition to investigate movement patterns of maned wolves (Chrysocyon brachyurus) in relation to vegetation vigour in their habitat. We use a set of GPS-trajectories from a group of maned wolves to which we link multi-source NDVI data as a proxy for the state of vegetation. We find that eigenbehaviours can identify patterns in the wolves’ responses to dynamic environmental conditions that align with the current literature on the species. Our research highlights the potential for dimensionality reduction and sequence-based methods to identify patterns in large tracking databases linked to contextual


Introduction
Contextualised movement data incorporate the environmental conditions within which movement occurs by linking contextual layers to location points within a trajectory (Dodge et al., 2013), which often creates highly dimensional datasets that can be difficult to represent and interpret. Not all dimensions are always necessary because there often exists a smaller intrinsic dimensionality in the data set that explains most of the behavioural variance (Demšar et al., 2013): reducing dimensionality can therefore help to understand fundamental patterns within the data. There are many strategies for dimensionality reduction, but a combined use of a sequence-based representation and principal components analysis (PCA) is new in movement research.
A sequence-based representation is traditional in medical sciences, particularly within bioinformatics (Abbott, 1995). In social sciences, a sequence represents life events of an individual as a string of characters between a precise start and end point in their lives (Ritschard and Studer, 2018). The same principle can be applied for representing movement trajectories, in which trajectories can be converted to regularly sampled temporal series of characters describing states related of the moving entity or the surrounding environmental conditions (Brum-Bastos, Long and Demšar, 2018).
Dimensionality reduction with eigen decomposition, a PCA method, has been used for object and facial recognition, shape and movement description, data interpolation and animation (Pentland and Williams, 1989). Eigen decomposition retrieves the structures of the data cloud in the high-dimensions space by decomposing it into a set of eigenvalues and eigenvectors. Eigenvectors are orthogonal unitarian column vectors, ordered according to their eigenvalues (Vidal, Ma and Sastry, 2016), which represent dimensions of the data cloud: the first eigenvector identifies the dimension of the largest variance in the data, the second one the second largest variance, and so on. By identifying the most relevant eigenvectors of a movement data set represented as sequences of states of an individual, eigen decomposition can find commonly repeated behavioural patterns, the so-called eigenbehaviours (Eagle and Pentland, 2009). Eigenbehaviours characterize the variation in the behaviour sequence of an entity during a time period. Eigenvectors with higher eigenvalues represent a repeated behaviour and a linear combination of an individual's eigenvectors can reconstruct and predict subsequent behaviour for each temporal unit in the sequential data. Eigenbehaviour analysis has been used on sequential data representing people's daily behaviours (Eagle and Pentland, 2009), but it has yet to be applied in movement ecology.
Searching for behavioural patterns in contextualised wildlife trajectories is a challenging task (Williams et al., 2019) -in this paper we propose the use of sequence-based representation followed by eigen decomposition for this purpose. We demonstrate these methods can be used together to explore how maned wolves (Chrysocyon brachyurus), the largest South American canids, respond to changes in their environment. Maned wolves are omnivores and their diet is mostly guided by the availability of prey and vegetation (Queirolo and Motta-Junior, 2007), therefore we expect that changes in vegetation vigour will trigger changes in their movement behaviour. We combine manned wolves' GPS tracking data with a multi-source high spatio-temporal resolution NDVI (Normalized Difference Vegetation Index) data, an indicator used to characterize vegetation vigour from multispectral satellite images (Xu et al., 2012). Next, we define a sequence-based representation to convert annotated tracking data into annual sequences of daily NDVI states for each individual. Finally, we apply eigen decomposition to these sequences to extract recurrent patterns of NDVI used by maned wolves.
The rest of the paper is structured as follows: first we describe the wildlife tracking data and NDVI dataset used in our analysis, followed by how NDVI data were integrated with wildlife tracking data and converted into sequences. Next, eigen decomposition is applied to identify patterns related to the vegetation use and availability. We conclude with a discussion and ideas for future research.

Movement data and study area
Maned wolves are a savannah adapted vulnerable species of omnivores found south of the Amazon Forest ( Figure 1A). The main threat to the species comes from the continuous large-scale habitat losses (Noss and Lima, 2007), which are especially significant in Brazil because of the extensive conversion of Brazilian savannah into farmland (Fonseca et al., 1994). The Serra da Canastra National Park (CNP) ( Figure 1B) has been key to the preservation of maned wolves and it is the home of the wolves whose tracking data are used in this study. We analysed a movement dataset collected between March 2007 and July 2015 by GPS collars attached to 7 female and 6 male maned wolves (de Paula, 2016) with the tracking period varying from 59 to 841 days (Table 1 and Figure 2). Figure 1 shows the study area, land uses, the boundary of the CNP and the individual home ranges derived from tracking data. Home ranges (HR) are the areas used by an individual during its normal activities for foraging, mating and rearing (Burt, 1943). As is typical in movement ecology, we defined them by delineating the 95% utilisation distribution (UD) surface for each individual (Hayne, 1949), where UD was calculated using kernel density estimation.   (Table 1) intersect to a large extent. The land use map in the background was produced by (de Paula, 2016).

NDVI data
NDVI (Normalized Difference Vegetation Index) is a proxy for the content and state of the live green vegetation, and it is used as an indicator of vegetation vigour. NDVI is calculated from the spectral reflectance in the red ( ) and near infra-red ( ρ ) wavelengths (Rouse et al., 1973): NDVI values range from -1 to 1. Values smaller than 0.1 are typically associated with bare rocks, sand, or snow; values around 0.2 to 0.5 are associated with sparse vegetation such as shrub, grasslands or senescence crops; values between 0.6 and 1.0 correspond to dense vegetation, such as tropical forests or crops at their peak growth stage (Hurley et al., 2014).
We used a high spatio-temporal detail NDVI data set created from data from multiple satellites and the methodology proposed by Rao et al. (2015). We combined data from MODIS, which has high a temporal resolution and a low spatial resolution, and data from Terra -ASTER, Landsat 4-5-7-8, CBERS 2 and CBERS 2B, which have a low temporal resolution but higher spatial resolutions. The high temporal resolution and high spatial detail of the fused data means that we had one NDVI image per day with a detail level of 30 meters instead of MODIS daily images with 250 m detail. Details of this multi-source NDVI data fusion are in (removed for peer review).

Annotating trajectories/home ranges and creation of sequences
For each individual we annotated GPS tracking data with high-resolution NDVI using the nearest neighbour annotation, matching each GPS point with the closest NDVI value in space and time (Brum-Bastos, Long and Demšar, 2016). We further assigned all NDVI pixels that fell within a specific home range to the respective home range.
To create the temporal state sequences for eigen decomposition, we followed a four steps process: 1) We computed the empirical distribution functions (EDF) of NDVI pixels within each home range for each day to characterise available resources; 2) We calculated average NDVI values for each day and wolf-this created numerical sequences of NDVI values for each wolf with one value per day; 3) Using the EDF of the respective day and home range of the specific wolf, we created another numerical sequence for each wolf, where the average NDVI was replaced with the empirical probability of the appearance of the average NDVI value on that day in that home range to identify the relationship between used and available resources; 4) Finally, we categorized the average NDVI values into states of use based on the following: Here, ̅̅̅̅̅̅̅̅ is the average NDVI value for a given day and home range, and ( ̅̅̅̅̅̅̅̅ ) is the probability of that value in that home range and day calculated from the respective EDF. This produced state sequences of high, low and average NDVI states at daily temporal resolution for each wolf, describing the average use of high-low-averagely vegetated habitat.
In the next step we cut the state sequences for each wolf into annual sequences, which start on the 1 st of July (a date that corresponds to the start of the wolf year, which is typically defined to start at the time when whelping rate peaks (de Paula et al., 2013). Wolf cycle consists of three periods: whelping (Jun-Sept, dry season), a non-reproductive period (Oct-Feb, wet season) and breeding (Mar-Jun, dry season).
Each wolf year sequence was treated individually, even when there were multiple cycles linked to the same wolf. We stacked all wolf year sequences into ( , ), a two-dimensional by 365 array (Figure 2), where was the total number of sequences and 365 is the number of days within a year. Each day in this matrix was represented with one of the four characters corresponding to the three states (high (H), low (L), and average (A) NDVI ) or to no data (N). We converted into ′ , a by 365 array of binary values (Figure 2).

Eigenbehaviour analysis: identifying structure in vegetation use
In order to identify particular behaviours, we then applied eigen decomposition to the ′ matrix, creating a set of 1460 eigenvectors with corresponding eigenvalues. The vectors with the highest eigenvalues are called the primary eigenbehaviours (Eagle and Pentland, 2009). As is typical in dimensionality reduction with PCA, we used a scree plot (Jolliffe, 2002) to heuristically determine which eigenvectors/behaviours describe most of the variance and which can be eliminated (Figure 3). In movement context this means that eigenvectors with low eigenvalues reflect individual behaviour, whereas eigenvectors with higher eigenvalues reflect behaviours that are common to most wolves in the study, i.e., population behaviour and are the ones we are interested in.

Results
The scree plot (Figure 3) shows that the first eigenbehaviour explained 45% of the variance in the data, the second 31.6%, the third 5.08%, the fourth 4.08% and the fifth 2.57% .The percentage of variance plateaus from the 6 th eigenbehaviour forward at less than 1% per eigenvector, therefore we kept only the first five eigenbehaviours which in total accounted for 88.33% of the variance.  Figure 4 shows the five first eigenbehaviours with respective absolute coefficients of each eigenvector. A higher eigenvector coefficient indicates a higher contribution of a particular NDVI state on a particular day and a lower eigenvector coefficient the opposite. Eigenvectors in the concept of PCA are difficult to interpret (Jolliffee 2002), however, we in the following tentatively interpret the results in the context of how we defined the states. The first eigenbehaviour seems to correspond to years where wolves stay in areas of average NDVI before and during the wet season and start to choose higher NDVI at the end of the wet season. Others can be interpreted similarly, with differences for choosing different values of NDVI in different seasons. Further, all eigenbehaviours in the second row show a persistent trend of choosing areas of low NDVI at specific times of the year.

Figure 4 -The top five eigenbehaviours for all the wolves, where each column represents one eigenbehaviour and each row represents one of the NDVI states. Each box represents one year and each vertical line one day. Shades of red on each day indicate the respective absolute coefficients of each eigenvector, that is, the higher the coefficient on a particular day, the higher the importance of the particular NDVI state on that specific day. The row for the no data (N) state is omitted because it does not provide any information about the behaviour. The letters underneath the boxes indicate the seasons, D for dry and W for Wet. The season axis is magnified at the bottom of the picture where it is matched by information about the key biological periods for the species.
To support an easier interpretation of eigenvectors in the context of wolf ecology, we simplified the sentences by selecting the state with the highest eigenvector coefficient on each day ( Figure 5). The first eigenbehaviour shows a preferential selection of high NDVI areas after the wet season, the second one shows less preferential selection of high NDVI areas and only in the first half of the wet season, the third one shows preferential selection of high NDVI in the first half of wet season and less so at the beginning of the dry season. The fourth eigenbehaviour shows preferential selection of high NDVI areas from the middle of the wet season onwards and the fifth shows preferential selection of high NDVI areas during the entire year. The second, third and fourth eigenbehaviours show a pattern of preferential selection of low NDVI areas in the second third of the wet season and at the end of the dry season.

Discussion and conclusion
This paper demonstrates the potential of a combination of sequence-based and dimensionality reduction methods for identification of patterns in contextualised movement data. The advantage of our method is that eigenbehaviours are not limited by the number of trajectories or the time period to be covered in the analysis. However, an outstanding challenge is how to interpret resulting eigenbehaviour patterns and validate the methodology. Interpreting eigenvectors is a wellknown problem for dimensionality reduction methods based on PCA (Demšar et al., 2013) -in the case of wildlife movement, this could be supported through collection of additional observational data on movement behaviour.
In absence of observational data, which are costly to obtain, the patterns we identified were to some extent supported by current literature on how wolves use vegetated areas in different seasons. We found that most wolves choose greener areas during the dry season, which tentatively agrees with the current literature in which wolves have special preference for wolf's fruit (Solanum lycocarpum) in the dry season (Queirolo and Motta-Junior, 2007). This fruit grows on a flowering shrub with height up to 5 m and large leaves, which has a higher NDVI response than most other vegetation in the study area, such as the heath and other vegetation in the study area, such as the heath and grasslands. Most wolves double their food intake during breeding season (Stahler, Smith and Guernsey, 2006), which tentatively matches with our results that wolves are choosing areas with higher NDVI during the breeding season. Visits to areas of low NDVI may be linked to denning and whelping, as typically wolves choose rocky areas for this purpose and those have low NDVI. Visits to areas with higher NDVI could be linked to feeding and foraging since these animals eat not only fruits but also small mammals that are often found in vegetated areas.
State sequences are a recent representation paradigm in movement research (Dodge, Laube and Weibel, 2012;De Groeve et al., 2016). The use of eigen decomposition on sequences has potential to aid with the generalization and reduction of thousands of trajectories to a few representative ones retrieved in the form of eigenbehaviours. In addition, eigen decomposition can help with the increasing demand for context-aware methods, especially for similarity analysis as it is able to separate group behaviour from individual behaviour by finding the first few principal components/eigenbehaviours of contextualised movement data.
In this paper our trajectories were annotated with only one type of contextual datathe NDVI values. In sequence analysis this corresponds to the traditional single-channel analysis from bioinformatics (Abbott, 1995) that considers only one type of states for each character in a sequence. However, movement is rarely dependent on one environmental variable only and often trajectories are combined with a number of different environmental descriptors (Gilbert et al., 2017). Each of these can potentially build a separate sequence, which can be linked into a so-called multichannel sequence (Müller et al., 2008). For movement, a multi-channel approach was used to link human movement to diverse weather variables (Brum-Bastos, Long and Demšar, 2018), however, in the wildlife movement context, the multi-channel methodology is virtually unknown. We plan to extend the eigenbehaviours approach to a multi-channel situation on a wildlife tracking data set that describes longdistance bird migrationthe aim is to evaluate if and how birds respond to different components of the Earth's magnetic field (inclination, intensity and direction) and if and how this drives their navigational strategies.