Hey You! Let’s Talk. Dialogue-Initiatives Revisited for Wayﬁnding Instructions

,


Introduction
Despite the complexity of the task, wayfinding, i.e. deliberately getting from a starting point to a destination in a familiar or unfamiliar environment [see e.g. 1,2], is successfully achieved by humans on an everyday basis. Research in systems providing assistance in solving this task and decreasing its complexity has raised considerable interest [see e.g. 3]. Among these efforts, route instructions per se, their calculation and their presentation have seen much interest in research on wayfinding assistance systems (see Section 2). The empirical foundation of the phrasing used for these instructions has, however, only raised interest in recent years [see 4]. With respect to the design of wayfinding assistance systems, human-to-human, real-world route instructions are particularly worthwhile to study from a linguistics point of view [see 5] for two reasons: First, the linguistic interactions can be mimicked by these systems in order to increase the likeliness of successful wayfinding. Second, the linguistic properties give an insight into how systems should act to resolve misunderstandings and, thereby, (re-)establish so-called Common Ground (CG). Up until now, however, we lack sufficient knowledge about how elements of discourse relate to wayfinding success [see 6]. In this paper, we try to add to this knowledge by 1. presenting the POPRIS-corpus of turn-by-turn route instructions given over the phone; this corpus was collected in an in-situ experiment in an environment participants were unfamiliar with; 2. providing evidence for the importance of CG and different modes of establishing; 3. deriving suggestions for Human-Computer Interaction (HCI) design for (re-)establishing CG in wayfinding assistance systems.

Related Work
In-line with the focus of our paper we review three strands of related work. An overview of existing corpora on route instructions will be used to provide a rationale why collecting a new corpus is reasonable. We continue with existing literature from a linguistics perspective. Finally, we shed light on work on route instruction generation in the domain of wayfinding assistance systems.

Corpora of Route Instructions
Although studies on the elements of route instructions regularly collect a corpus (i.e. "[a] collection of linguistic data, either compiled as written texts or as a transcription of recorded speech" [7]), only few corpora are made publicly available. Recently, Liu, Tree, and Walker [8] present a corpus of over the phone in-situ conversations during which the sender S is required to guide the receiver R to several different artworks based on maps. Similar to our German-language POPRIS-corpus, this corpus encompasses transcriptions of full English language conversations. Götze and Boye [9] present and describe the SPACEREF corpus which consists of descriptions of actions and the spatial environment of a pedestrian while walking a predefined route in a Swedish town. Tenbrink et al. [6] present a corpus full of linguistic expressions of spatial relations in German; these, however, were collected in a "referential communication task that involved furnishing a dolls' house" [6, p. 2]. Other dialogue corpora with spatial expressions have been compiled during experimental tasks to explain routes on maps in order to study cognitive reasoning during complex tasks. The HCRC Map Task Corpus [10] is a well-known example for this type of collection. Corpora have also been used to compare L1-and L2-learners and their capabilities to generate spatial expressions [see e.g. 11,12,13]. Taken together, several of these corpora include route instructions. Compared to these, the contribution of our POPRIS-corpus is twofold: 1) It contains the full dialogues between pairs of people unfamiliar with the environment during the whole experiment; 2) data was gathered by means of an in-situ experiment during which on-line, turnby-turn route instructions over the phone were collected from participants, who were not allowed to use any aids to perform the task. Given this setting, these dialogues are expected to yield significant insights into referential, real-world collection of utterances suitable to increase HCI dialogue structures.

Elements of route instructions
The elements of route instructions have seen a long-lasting interest among a range of disciplines, including linguistics [see 14, for an extensive overview], psychology, geography [see 15, pp. 65-67 for early references] as well as AI and cognitive science. Consequently, the amount of literature into route instructions is vast. This overview can, therefore, only mention highlights on these different strands; we, therefore, focus on the Lingustics and GIScience perspective.
Linguistics Detailed analyses of elements of route instructions have been conducted throughout the years, essentially starting in the late 1970s/early 1980s. Wunderlich [16] authored a highly influential publication for the assessment of German language route instructions. He provides a discourse-theoretic analysis of these instructions and, thereby, identifies different discourse markers for different segments of a route. Klein [17] uses a corpus of route instructions given to a receiver prior to navigation in order to analyse deictic elements. Despite the fact that Klein neither focuses nor points to these, his examples resemble topic-constructions based on existential-presentative constructions (EPCs) [18] as found in our corpus.
More recent analyses have, for instance, focused on differences between males and females in in-situ collected directions given to car drivers [19]; others have investigated lexical choice and referring expressions in over-the-phone instructions [20]. Generally speaking, linguistic accounts of route instructions have recently regained momentum through Cognitive Discourse Analysis [CODA,4]. This method is particularly suitable to examine how decision points are linguistically represented (e.g. in terms of syntactic position). It is, therefore, of particular interest to our study, as we provide evidence that EPCs are syntactic patterns which are used to mark decision points by means of landmarks (see Section 5).
GIScience The importance of landmarks for spatial cognition and, consequently, their prevalence in route instructions has been studied frequently. Lovelace, Hegarty, and Montello [15] introduce different types and roles of landmarks and present evidence that the quality of route instructions is a function of the types of landmarks and its combinations. Since then, numerous attempts have been made to model [see e.g. 21,22,3,23,24] and measure the salience of objects [see e.g. 25,26,27,28] as this is a prerequisite to enable references to landmarks in route instructions.
In addition to that, considerable effort has been put into creating and assessing systems which are suitable to provide route instructions to users based on landmarks. The adaptation of route instructions to the spatial context or user knowledge and preferences in Artificial Intelligence and Cognitive Science research has seen considerable interest as early as the mid-1990s: For example, the MOSES project [29,30,31] investigated route generation by taking the perception and changes of spatial layout during navigation into account. These efforts already resemble the problem of appropriate levels of granularity in route instructions which has been of particular importance in numerous studies. Tomko and Winter [32] present a recursive algorithm to determine route instructions which adapt their level of granularity to the distance to the destination. Based on wayfinding choreme theory, Richter and Klippel [33] develop a hierarchy of route direction elements. Richter, Tomko, and Winter [34] suggest a dialogue-based solution in which users can request more details to solve the issue of adapting the instruction's granularity to a user's knowledge. Finally, cognitively ergonomic route instructions based on chunking [35,36] have been a milestone in research on adapting route instructions to user knowledge. Researchers have, moreover, investigated suitable means to include landmarks in route instructions from a computational linguistics perspective [see e.g. 37]. Disambiguation of landmarks has been identified as a major issue in these efforts; the use of EPCs as topic construction implies this disambiguation (see Section 5.1). It is important to note that the corpus we present may be equally useful for each of these disciplines. Having said this, our analysis focuses on the discourse, sentence and dialogue structures in human-to-human conversations in order to derive HCI-guidelines for wayfinding assistance systems.

Experimental Design
In this section we provide details for the experimental design based on which we collected the corpus of route instructions. The whole experimental design is based on the goal to study the discourse which leads to successful route instructions in a real-world wayfinding situation.

Routes
The experiments took place in different parts of a small European town (Regensburg) of Roman decent, its suburbs or close-by villages depending on which area potential participants were not familiar with (see below for further explanations). In order to find comparable routes across experimental areas (see [3] for a model on wayfinding decision situations), we made use of the intersections framework introduced in [38]. The chosen routes (see Section 4.2) reflect the relative frequency of the 3-and 4-way intersections (see Table 1), which were the two topmost intersection types. Table 1. Characteristics of the routes; Br denotes branches, Len. denotes the length of route, Dur. denotes the walking duration according to Google Maps, PDP means potential decision point, TDP means true decision point, %(3) and %(4) describe the relative frequencies of 3-and 4-way intersections included in the route, Norm (3) and Norm (4) display the relative frequencies of 3and 4-way intersections in the whole area of Regensburg as reference values. Due to the spatial configuration of area 4, however, it was impossible to include 4-way intersections in these routes. Route

Finding pairs of participants
Potential participants were acquired through leaflets distributed at the local university. Any person who expressed interest, was required to fill in an online survey on sense of direction (FRS, [39]), the Big Five Personality Traits (BFI-60, [40]), demographic data (age, gender) and their spatial knowledge about several regions: For each of 14 different outdoor environments, which were clearly marked as polygons on a map, participants were required to indicate whether they have ever been there before and, if yes, how good they knew this area. The goal of this procedure was to find environments with which participants were not familiar. This means, all experiments were conducted in an environment participants were not familiar with, in order to avoid potential bias from familiarity (see [41] for evidence on the impact familiarity has on the way route instructions are given). Once a pool of potential participants was found, pairs of participants were randomly chosen, roles (sender/receiver) were assigned randomly and -if a pair of participants indicated to be unfamiliar with more than one experimental area -the area was chosen randomly, too.

Procedure
In order to ensure no interactions between sender and receiver, both were required to arrive separately at the starting point of their assigned route. The experimental procedure comprised five steps: 1. A student experimenter and the sender met at a bus stop close to the starting point of the assigned route. 2. The sender was instructed that the experimenter would guide her/him on a route and that s/he will later be required to give route instructions to the receiver over the phone and that these should enable the receiver to follow the route exactly. 3. The experimenter guided the sender along the predefined route, avoiding conversations as much as possible in order to prevent potential bias. 4. At the destination, senders were asked to wait until the receiver would call and reminded about their task. 5. The experimenter met the receiver at the starting point and instructed her/him that the sender would guide her/him over the phone on a specific route and that s/he is required to follow this route exactly. The experimenter did not intervene, but followed the receiver silently and noted down any wrong turns (if any). The conversations between sender and receiver were audio recorded.

An overview of corpus data
The POPRIS-corpus consists of the full transcripts of sixteen experiments (N = 32 participants) covering 3:22:32 hours of German language audio material in total. Audio recordings were transcribed using the software f4transkript [see 42] in accordance with the HIAT [see 43] guidelines. In particular, the following spoken-language linguistic phenomena are part of the transcripts: pauses, overlap, hesitation, emphasis, repair, lengthening, sentence type, anacoluthon, linking and non-verbal information. The complete discourse between sender and receiver is included in the transcripts. As the full experiment was recorded and transcribed the transcripts cover also conversation between sender and receiver which is not related to the route guidance task itself; this yields a full picture of conversations occurring during this type of task potentially valuable for other disciplines. Out of 16 experiments one experiment was abandoned on request by the sender, who admittedly was unable to guide the receiver along the route. In addition to the transcripts, the corpus includes age and gender as demographic variables, the 19-item questionnaire Fragebogen räumliche Strategien (i.e. a German language sense of direction scale, [39]) and the 60 items of the BFI-2 [40]. Based on the norm data of the German population [44] the majority of the participants has generally high spatial abilities. These figures reflect the fact that the willingness to participate in a wayfinding experiment is increased for those who perceive their sense of direction as good. As far as the Big Five Personality Traits are concerned, participants show mainly (above) average values (see [40] for norm data) for agreeableness (f

Data and Software Availability
The POPRIS-Corpus is publicly available through the zenodo.org platform (DOI: 10.5281/zenodo.3695744). It includes the transcripts, a QGIS-file containing all routes as well as age, gender, FRS and BFI-2 data for each of the participants.

Important Definitions
The analysis is based on three important linguistic concepts, namely Common Ground (CG) Content, Topic Construction and Existential-presentative constructions (EPCs). CG content can be seen as the shared knowledge of interlocutors, which consists of propositions and already introduced entities [see e.g. 45]. It is subject to continuous change throughout the conversation [see 46, p. 245]. Generally speaking, topic constructions consist of two parts, the topic and the comment: "The topic constituent identifies the entity or set of entities under which the information expressed in the comment constituent should be stored in the CG content" [45, p. 5]. Existential-presentative constructions (EPCs) [18] are one of several existing types of topic constructions. As landmarks are entities and references to these are known to be used in human-to-human route instructions, EPCs are an obvious mean of CG management [46] and will be used as a basis for our analysis.

Discourse Analysis Results
The importance of EPCs The whole corpus contains three different cases of how EPCs are used in instructions, two of which show important ways of repairing insufficient CG. The corpus data gathered provide evidence that EPCs are the most common syntactic way of introducing a new (i.e. up until now not mentioned) object, which stands out in the local environment. Most frequently, the EPC established a CG, which, in turn, resulted in a comprehensible route instruction. Sample Dialogue Case 1 provides an example for this type of discourse: In this case, a route instruction is split into three subsequent actions: The sender uses an EPC ("And uh can you see/ there is another bus stop sign") to introduce a new entity. The sender, thereby, expects the receiver to explicitly confirm that s/he can identify the object s/he referred to in the local environment. This is a mandatory step in order to ensure that both, sender and receiver, share a CG. When CG is achieved (confirmation by the receiver: "Yes"), the sender continues with the actual route instruction ("And now you just go towards it") -which is, in fact, the comment of the topic construction.
Occasionally (27 times) however, the POPRIS-corpus contains discourse segments which show a misunderstanding between sender and receiver. Two different variants exist, in both of which CG is not successfully established, but for different reasons. In variant 1, CG is not achieved, although both, sender and receiver, essentially share it. This means, the sender imagines the receiver to be at a particular location where the receiver in-fact is. Yet, their communication about the location is unsuccessful. Sample Dialogue Case 2 provides an example: Sample Dialogue Case 2: A dialogue example during which a misinterpretation (church vs. right-hand bend) occurs between sender and receiver and is repaired by introducing new salient objects and references to those which have already been part of the CG content. The sender uses an EPC to introduce a feature of the street network ("right-hand bend") as a salient object. The receiver, however, does not confirm this immediately but introduces a salient object ("yellow house"), too. One reason might be, that the road network feature is less salient to R than the colour of the facade. From this turn on, sender and receiver swap their roles, i.e. R introduces salient objects using EPCs (e.g. "church") and expects the sender to confirm spatial knowledge about these. S, however, does not remember a church and CG is not achieved. The roles, consequently, remain swapped until CG is established.
In contrast to variant 1, CG cannot be established in variant 2 because it was not established as part of the preceding route instruction. Sample Dialogue Case 3 provides an example for reestablishing CG. Please note: In the previous step, R had mistakenly confirmed the landmark ("silver bridges") which led S to assume that CG was established which it in fact was not. Sample Dialogue Case 3: A dialogue example during which a misunderstanding ("silver bridges" were mistakenly confirmed) occurs between sender and receiver. Parts of the discourse which were not relevant for the repair are left out (marked by dots). So, jetz weiß ich auch, was du mit silbernen Brücken meinst. ((lacht)) .

So now I know what you mean by "silver bridges". ((laughs)) .
Again several turns between S and R are needed to fix the CG which has been broken for some time. Similar to variant 1 roles between S and R are swapped: R is able to derive the location at which the misunderstanding occurred and derives the corrected route instruction ("after the bus stop I should have gone left, right?"). On arriving there, R recognises several entities introduced earlier by S and, finally, establishes CG by explicitly referring to the "silver bridges" which have been introduced earlier by S.
While both variants include different reasons for non-established CGs, the solution for both variants is essentially the same: R and S swap roles, i.e. the receiver starts to introduce objects which are salient to her/him in the environment using EPCs and expects a confirmation by the sender in order to establish CG. Receivers will, then, continue to make route suggestions (see variant 2) or senders will continue to give instructions (see variant 1).

HCI Design Guidelines to foster CG establishing
Taken together, our corpus provides evidence for a general discourse pattern: S introduces a salient object by using an EPC and expects a confirmation by R about the recognition of this objects in the local environment. If this is the case, CG is established and S continues to give a route instruction. If, however, R cannot identify the salient object, it is not yet part of the CG content: Restoring CG is done by swapping roles between R and S, during which R primarily uses EPCs to refer to entities which either have already been part of the CG content or which are salient to her/him. Only when CG is re-established, S continues to provide route instructions.
Taken together, the dialogue analysis provide evidence that the discourse between sender and receiver is about continuously tracking the state of affairs with respect to explaining a route and localisation: Both, sender and receiver, can assume (A) or know (K) that the receiver's current position is correct (C) or wrong (W) in the context of the current navigation step. If and only if both, sender and receiver, know that the position is correct, the CG for the next instruction is successfully established. This means, a wayfinding assistance system acting as a sender has to implement a task model consisting of 16  represents the desired final state3 in which the CG is successfully established. In humanto-human dialogues, EPCs serve the pragmatic purpose of changing the current state of the grounding task to the final state. A wayfinding assistance system, which mimicks human discourse, must track the state of tasks for establishing CG continuously throughout the navigation process. It has to compute a belief (i.e. a probability distribution over all 16 states) each time it receives new input from users, regardless whether it is point-and-click or natural language based. Graphical user interfaces, however, have 3Despite its use in a different domain, a similar approach has been taken successfully in [47]. the advantage to avoid additional uncertainty introduced by misinterpretation of user utterances. In any case, a user interface should allow to update the task state and adhere to the workflow of human-to-human dialogues (see above).
From our point of view, it is, therefore, essential to use an easy to comprehend interface to establish CG in a sequential manner which mimicks human-to-human CG negotiations through EPCs. Specifically, we are able to provide two guidelines for the implementation of such a user interface. The use of salient objects is a prerequisite according to our and earlier evidence [see e.g. 15,48,49]. Selecting the most salient object in a given spatial context is, thereby, a challenge itself, because of personal preferences, personal salience, personality traits and spatial abilities [see e.g. 24].
Assuming that an object has been identified, the user interface should, first, present it and request explicit feedback from the user whether it is recognised. If the user fails to identify the object in the environment and gives appropriate feedback, this step should be repeated with other objects (backward-sorted with respect to their salience) up until the system can reliably assume that it agrees with the user on the current position. Clearly, system knowledge about the user position (e.g. based on GNSS) is insufficient to draw conclusions about the user's knowledge about and confidence in his/her current position. A route instruction can only be successful if both, the current position and the user knowledge about this position match. In these cases the route instruction can be based on an object match. In order to achieve this match, e.g. in a speech enabled interface, the system's output and instructions should be limited to the syntactic and semantic structure of EPCs because users are adapted to these expressions in navigation instructions as the dialogues indicate. Interrupting the voice output should be enabled (e.g. using a specific keyword) when providing feedback as the turns in our corpus show considerable overlap. A second guideline deals with the inversion of roles between sender and receiver. We provide evidence that CG is most commonly repaired by a receiver introducing new or referring to entities which are already part of the CG content. An ideal system, therefore, would be able to understand user input on salient objects, i.e. users can change the system's belief by providing environmental information. One potential way towards such system capabilities is to train the Natural Language Understanding module of a speech enabled navigation user interface to detect these pragmatic purposes and update the belief about the task state accordingly. It remains, however, an open research question how the system can match the information contained in user input with its knowledge about the environment: Comprehending user descriptions of their current environment as well as identification of the objects using the system's knowledge base are two unresolved challenges. A first step, however, may be the wayfinding assistance system's capability to understand that a user feels lost. One option to realize this capability in a GUI scenario is the implementation of an appropriate emergency button which initiates a dialogue to establish CG as outlined above: The system asks the user iteratively about surrounding objects (starting with the most salient one) until CG is established. In this work we present and analyse the POPRIS-corpus of turn-by-turn landmark-based route instructions and illustrate the importance of its linguistic properties for wayfinding assistance systems. The main contribution of this novel corpus is based on the full transcripts of the dialogues between pairs of participants who were unfamiliar with the experimental area. In addition to transcripts of utterances, the corpus contains sense of direction and personality data and is, therefore, expected to be valuable for a variety of disciplines.
The corpus data provides evidence for the dominance of EPCs to establish CG. Both, senders and receivers use these to introduce new salient objects or refer to those mentioned before in case CG is not established. Two variants were discussed in which CG is not established: In variant 1, the reason is a misunderstanding [see 50], i.e. the receiver confirms the recognition of different silver bridges than those the sender actually referred to. Variant 2, however, is a case of misinterpretation [see 50]: The sender refers to a right-hand bend not recognised by the receiver, whereas the sender cannot interpret the receiver's reference to the church.
Based on our empirical findings we derived a negotiation protocol between a user and an assistance system, which is suitable to establish CG. By explicitly requiring feedback on the recognition of a salient object before a route instruction is based on it, the proposed assistance system can help resolve ambiguities, avoid uncertainty as well as reduce cognitive load. Therefore, user feedback is requested at the decision point level and continues until CG is established. This is in contrast to [34], in which it is suggested to stop negotiations between system and user when the decision point level is reached. Generally speaking, it is interesting to see that the formulation and presentation of route instructions has seen major research interest; however, state-ofthe-art user interfaces using these results are (with the exception of [34]) implemented on the assumption that the current position and the user knowledge about this position matches the system's assumptions/knowledge of both ((K, C, K, C), see above) and that this match is reached immediately each time a new instruction is presented. This is in sharp contrast to evidence, for example, [see 51] on the fact that maps generally show multiple possible interpretations, rendering errors in CG very likely. The evidence and approach we derive, however, is in-line with the guideline presented in [52], as the confirmation of CG resembles the confirmation of correct decision making. The closest approach to our proposed assistance system is proposed in [53]; Bauer describes an experiment during which participants were required to press a button labelled "Ziel erkannt" (goal perceived) [53, p. 89] when participants had the impression that they understood the landmark-based instruction as a whole. In this prototype a holistic view on an instruction is proposed, consequently. In contrast to this holistic view, however, we provide evidence for the fact that introducing a salient object using an EPC and providing an instruction should be two distinct steps. An assistance system, which mimicks this way of human-to-human CG management, is likely to resolve trust issues, which might occur when a system uses references to landmarks (see e.g. [54] for this claim). In addition to that, since the negotiation phase requires to search for and recognise objects in the environment the acquisition of spatial knowledge is supported, thereby counterbalancing potential negative effects of extensive navigation system use [see e.g. 55]. The major limitation which applies in terms of implementing such systems is the availability of large scale salience values for objects -despite theoretical [see e.g. 21,22,24] and empirical [see e.g. 25,56,26,23,28] efforts. Currently, no large-scale geospatial database exists from which a wayfinding assistance system can extract candidate objects and use these for establishing the CG. The issue of how to fill a database with high quality entries is an open question for research and practice of navigation systems. VGI [57] approaches may possibly offer a way to acquire this kind of geospatial knowledge on a large scale.

Conclusion and Future Work
In this paper we shed light on the use of existential-presentative constructions to establish Common Ground (CG) in a wayfinding task and their implications for the design of wayfinding assistance systems. Specifically, we present a corpus of on-line route instructions given in turn-by-turn manner over the phone in-situ. Existential-presentative constructions (EPCs) appear to be the most common way of introducing salient objects (landmarks), based on which, first, CG is negotiated and, second, an actual route instruction is given referring to this object. This human-to-human behaviour is, from our point of view, a suitable means to base a system's behaviour on. A wayfinding assistance system should request explicit user feedback on whether the salient object it refers to is recognised or not. If not, it should continue to ask for recognition of other objects in order to establish CG. A route instruction should be issued only then. In addition to that, the user interface should allow users to initiate dialogues (e.g. by means of an emergency button) about objects they recognise in the environment in order to ensure correct self-positioning of the user. Three different strands of future work are planned. A first strand is dedicated to in-situ and virtual reality HCI studies in order to determine suitable interaction techniques for cases in which CG is and those in which it is not established. To this end, we plan a series of virtual reality and in-situ experiments. We, furthermore, plan to analyse how users of assistance systems make sense of route instructions based on think-aloud protocols and if/how this might differ from human-to-human route communication (in particular with respect to the information needs expressed using EPCs). These differences may have an impact on the efficiency of route instructions provided by the system. Finally, we intend to add an implementation of the task model for establishing the CG to an existing navigation system that currently cannot track a belief state based on user feedback: We plan to use the speech data we have collected to train a NLU component that helps in tracking the state of grounding tasks by analysing user utterances.