Research

The main research domain of the CorpusLab is linguistic variation in space. We develop methods and tools for comparing languages and linguistic structures using corpora. 

Automatic Glossing and Morphological Analysis

The goal of this work is to automatise linguistic glossing (= morphological segmentation and analysis) to reduce the cost and to improve the consistency of manual glossing of corpora. In this way, we contribute to developing new linguistic resources which allow us to make new quantitative observations about linguistic structures and their cross-linguistic correspondence.

Most of this work is done in collaboration wit the ACQDIV project. 


Regional vs. Standard Variation in Croatian and Serbian

The goal of the study is to identify, describe, and visualise the patterns of regional variation in linguistic features extracted from Twitter messages and, potentially, other sources of computer-mediated communication. The main research question that we address is: To what extent linguistic areal patterns correspond to the current state borders? In other words, do linguistic regions correspond to administrative regions or not? 

The spread of the features considered typically Croatian and typically Serbian is currently unknown. A gradual  transition from one to the other feature setting would be an indicator that Croatian and Serbian are just regional variants of the same linguistic entity. A sudden transition influenced by a state border would be an indicator of actual separation between the two variants. Our research is intended to provide empirical evidence for the current feature distribution and an interpretation of the findings in light of the historical processes.

This work is part of the ReLDI project.


Corpus-based Typology of Morphological Richness

The goal of the project is to map a wide range of languages to a scale (or a space) of morphological richness defined on the basis of corpus observations and to examine the geographical distribution of morphologically rich vs. morphologically poor languages. The work on the project up to now has involved one international collaboration 

The goal of the project is to map a wide range of languages to a scale (or a space) of morphological richness defined on the basis of corpus observations and to examine the geographical distribution of morphologically rich vs. morphologically poor languages. The work on the project up to now has involved one international collaboration