Research Group “Spatial References”

The Spatial References Research Group addresses two key themes of relevance to the URPP language and space.  The first of these themes is methodological, and concerned with the development of methods to extract spatial information from text. Central to this theme is automatic named entity recognition, that is to say the identification and spatial grounding of references to specific locations in language. In particular, we are interested in the analysis of texts that are rich in spatial information and provide fine-grained toponym information. The Text+Berg corpus is an example of such a text where mountaineering reports specify fine details of positional and directional information and give descriptions of the relevant landmarks. Our research focuses on methods for automatic recognition and analysis of different types of toponyms (natural entities: e.g. mountains, glaciers, lakes; and man-made entities: e.g. cabins, dams, towers). The analysis consists of recognition, classification, building co-reference chains and grounding. It also includes disambiguation of geo/non-geo ambiguities (e.g. Mönch as mountain name or as monk) and geo/geo ambiguities (e.g. there are more than a dozen mountains in Switzerland called Schwarzhorn). Ongoing work in both geography and computational linguistics will be harnessed to develop a set of tools capable of processing and visualizing results over time. Recent approaches to this problem focus on the use of machine learning and investigating statistical and hybrid perspectives. To date, these have concentrated on corpora for which annotations are already available, typically in the form of Wikipedia, Twitter and so on where georeferenced metadata associated with unstructured texts are already available. However, we argue that there is a pressing need to develop methods on more representative corpora, written in more traditional ways. We therefore propose in this group to build manually annotated corpora for training and evaluation of such systems, focused specifically on texts rich in spatial information such as the Text+Berg corpus.

The second theme relates to using spatially-rich texts as ways of understanding culture. Ekaterina Egorova’s PhD thesis is an example of such ongoing work, which we seek to broaden in a collaborative project. By integrating methodological advances such as those described above, we can focus on an object of study (e.g. mountains and cultural understandings thereof or descriptions of space as produced in postcards) and use linguistically informed methods to analyze corpora containing spatially referenced descriptions. Thus, for instance, we will gain insights into the influence of language on human-landscape relations by examining the way landscape is described in oral and written sources and the meanings attached to it. Studying the way we talk about landscape provides valuable clues regarding its meaning and the role of cultural practices, and has direct relevance to decision-making processes.