From Space to Language

The aim of this research perspective is to contribute to linguistic research questions through the incorporation of the geographic information into the analysis. In what follows, a number of ongoing projects are described:

Distribution of Dogon Languages

Motivation and goal: The Dogon language family consists of some 20 languages distributed over an area with the size of Switzerland, located in Mali and bordering on Burkina Faso. On the one hand, Dogon languages have not yet been fitted into the puzzle of African languages. On the other hand, they represent a complex spatial pattern of linguistic diversity along a large and often impenetrable natural cliff (Bandiagara). The goal of this project is to quantify the influence of the Bandiagara and to test if accessibility can account for some of the unexplained linguistic features such as extensive lexical borrowing.

Some Dogon languages (dots) connected by least cost paths. Basemap: SRTM Elevation Model

Cooperation: The Dogon project is carried out in cooperation between the General Linguistics Group of the University of Zurich, represented by Steven Moran and Prof. Balthasar Bickel, and the GISLab.

State and Outcome: The first R toolbox, that allows automatic comparisons between different distance measures and linguistic similarity, on global and local level, is completed. The list of distance measures includes Euclidean Distance, Least Cost Paths and horizontal/vertical distances. After an evaluation phase, a generic adaption of our toolbox will be made available for public use.

Expansion of Bantu Languages

Motivation and Goal: There are contested theories on the expansion of Bantu languages, such as the two paradigms early- or late-split. We aimed at reconstructing the most probable expansion of Bantu languages, starting in Nigeria and from there, following the paths associated with least travel costs, towards east and south. We thus conducted a least cost path analysis with the aim of spatially reconstructing the Bantu language tree.

Language Tree of Bantu languages and its representation projected on a least cost path realisation. Source: Msc Thesis C. Wirth 2014

Cooperation: The project is a joint supervised master thesis between Prof. Robert Weibel (Geography, Zurich), Prof. Balthasar Bickel (General Linguistics, Zurich) and the GISLab.

State and Outcome: The Christian Wirth successfully defended his master thesis in September 2014. The results of the thesis are a very good starting point for further investigations (Msc Thesis C. Wirth 2014).

Toolbox for Analysing Dialect Data

Motivation and Goal: Dialect data, for instance representing syntax in Switzerland, often has the same characteristics: it is unevenly distributed in space (in particular if collected through online questionnaires or mobile apps), there is a varying number of answers per location, and each linguistic feature can take a different number of categorical values. For this reason, we decided to develop a toolbox that would formalize a standard spatial analysis workflow for dialect data. The analysis includes visual as well as statistical output and allows conducting a large series of tests one step at a time. The toolbox will be freely available as an R package.

Relative distribution of a feature in different Dutch speaking regions

Cooperation: This project was set up in a close cooperation between Prof. Elvira Glaser and Philip Stöckle from the SNSF-project "Modelling morphosyntactic area formation in Swiss German (SynMod)", the Institute of German studies (division of Dutch) and the GISLab.

State and Outcome: The project is currently in a development phase with a simple first edition of the tool being finished. From February 2015 on an assistant student is hired to exclusively work on this project.

Global Language Similarities explored in large-p/small-n data collections from Linguistic Typology

Motivation and Goal: It is only recently that large compilations of typological data have been homogenized and made freely available. Such information usually comes as a matrix containing some four to six hundred categorical (i.e. multinomial) linguistic features for several hundred global languages (i.e. large-p/small-n, with n actually not being so small). One of the overarching questions that might be answered with such data is: What is the global relatedness of languages and which historic linguistic theory does it support? However, the challenges in using this data for this purpose are, for instance, its categorical character, its large-p, its uneven spatial distribution, many NA values, etc. For this reason, we introduced a procedure that reduces dimensionality while still reflecting the impact of individual linguistic features and NA values in particular. Additionally, our approach accounts for the spatial character of the data and thus combines dimension reduction with spatial analysis.

Distribution of different measures account for linguistic similarity. Basemap: OSM

Cooperation: The project is a cooperation between Prof. Balthasar Bickel (University of Zurich) and the GISLab.

State and Outcome: The project is still in its early stage. An incorporation of the approach into a broader research context is currently under evaluation.

Uneven Distribution of Morphology in different Language Families

Motivation and Goal: The global distribution of morphological structure seems to be unevenly skewed. This project has the goal to explain the impact of language contact on morphology through the combination of historical linguistics, phylogeny and geographic analysis. In the first step, we gain the necessary background of the relevant theories in linguistics; in the second step, we apply these insights, in combination with new methodological approaches, to regions only associated with sparse historic linguistic information.

Cooperation: This project will have the form of a SNSF-Sinergia collaboration between Linguistics and Geography in Zurich and Bern.

State and Outcome: The project team has submitted the proposal mid-January.