Research Group “Spatial References”

The Spatial References Research Group addresses two key themes of relevance to the URPP language and space. The first of these themes is methodological, and concerned with the development of methods to extract spatial information from text. Central to this theme is automatic named entity recognition, that is to say the identification and spatial grounding of references to specific locations in language. In particular, we are interested in the analysis of texts that are rich in spatial information and provide fine-grained toponym information. The Text+Berg corpus is an example of such a text where mountaineering reports specify fine details of positional and directional information and give descriptions of the relevant landmarks. Our research focuses on methods for automatic recognition and analysis of different types of toponyms (natural entities: e.g. mountains, glaciers, lakes; and man-made entities: e.g. cabins, dams, towers). The analysis consists of recognition, classification, building co-reference chains and grounding. It also includes disambiguation of geo/non-geo ambiguities (e.g. Mönch as mountain name or as monk) and geo/geo ambiguities (e.g. there are more than a dozen mountains in Switzerland called Schwarzhorn). Recent approaches to this problem focus on the use of machine learning and investigating statistical and hybrid perspectives. To date, these have concentrated on corpora for which annotations are already available, often in the form of Wikipedia, Twitter and so on where georeferenced metadata associated with unstructured texts exist. However, we argue that there is a pressing need to develop methods on more representative corpora, written in more traditional ways. We have therefore developed initiatives to manually annotate corpora for training and evaluation of such systems, focussing specifically on texts rich in spatial information such as the Text+Berg corpus.

The second theme relates to using spatially-rich texts as ways of understanding culture. Ekaterina Egorova’s PhD thesis was one example of such work, with a focus on particular sorts of spatial language (for example fictive motion) in a particular genre of texts (mountaineering narratives). Manuel Bär’s recently commenced PhD takes a different perspective, and is exploring the potential of gamification to collect rich texts describing landscape in situ. As such it addresses both a need to develop methods suitable for targeted collection of natural language, and gives us rich material with which to explore spatial references related to perception. By integrating methodological advances such as those described above, we can focus on an object of study (e.g. mountains and cultural understandings thereof or descriptions of space as produced in postcards) and use linguistically informed methods to analyze corpora containing spatially referenced descriptions. Thus, we will gain insights into the influence of language on human-landscape relations by examining the way landscape is described in oral and written sources and the meanings attached to it. Studying the way we talk about landscape provides valuable clues regarding its meaning and the role of cultural practices, and has direct relevance to decision-making processes.