The goal of the study is to identify, describe, and visualize the patterns of regional variation in Croatian and Serbian linguistic features extracted from Twitter messages and, potentially, other sources of computer-mediated communication. The main research question we address is: To what extent do linguistic areal patterns correspond to current state borders? In other words, do linguistic regions correspond with administrative regions or not?
The spread of the features considered typically Croatian and typically Serbian is currently unknown. A gradual transition from one feature setting to another feature would be an indicator that Croatian and Serbian are just regional variants of the same linguistic entity. A sudden transition influenced by a state border would be an indicator of actual separation between the two variants. Our research intends to provide empirical evidence for the current feature distribution and an interpretation of the findings in light of historical processes.
This work is part of the ReLDI project.