Text Group

The size of language corpora (= collections of machine-readable texts) is currently measured in billions of tokens. These vast records of language use represent a great potential source of data for linguistic research. This opportunity, however, comes with a great challenge: How do we turn hundreds of thousands of observations into linguistic evidence?

In the corpus-linguistic laboratory (CorpusLab), computers are used as lab instruments. We extract data from language corpora automatically using natural language processing. We measure linguistic phenomena based on corpus counts. We apply statistical modelling and inference to understand the structures and the rules behind the observed language use.

We are especially interested in studying linguistic variation in space. We develop methods and tools for comparing languages and linguistic structures using corpora.

Some current research topics

Visualisation by Phillip Ströbel

Weiterführende Informationen

The ArchiMob corpus of spoken Swiss German
The ReLDI project
How to make data reusable?

URPP Language and Space Language and Space Lab

Quicklinks und Sprachwechsel

Main navigation

Text Group

Weiterführende Informationen

Short links