Teodora Vuković

Teodora Vuković, MA

Doctoral student at the CorpusLab

Address: Freiestrasse 16, 8032 Zürich

Room number: FRF E 5


PhD project

Corpus-based Study of Post-positive Articles in Torlak​

The research focuses on the post-positive definite article in South Slavic Torlak dialect, spoken in the area close to the border between Serbia, Bulgaria and Macedonia. Bulgarian and Macedonian are the two Slavic languages that use post-posed definite article, while in standard Serbian this feature is not present. The transitional group of Torlak dialects has post-posed definite articles, but they are not used in the same way as in Macedonian and Bulgarian. The research is partially based on findings by Olga Mladenova from her study on definiteness in Bulgarian, and those given by Balkan Slavicists, such as Andrey Sobolev (1998), Aleksandar Belić (1905) and Olga Mišeska-Tomić (2006). The analysis will examine the pragmatic or semantic situations enabling the usage of the articles used in these dialects. Pragmatic context includes distinction between new and given information, while semantic context refers to semantic classes of nouns, such as human, non-human, location, time reference, etc, since they may have an effect on article use (Laury 1997).
The research sample is provided by the fieldwork recordings collected in rural parts of South-Eastern Serbia between 2015 and 2016. The materials amount to over 300 hours of audio and video recordings, out of which 100 hours have been selected for the research. The aim is to create a corpus which would be designed for the analysis of the post-posed definite article. For the analysis to be possible, several layers of additional information will have to be provided. The basic ones are part-of-speech and morphosyntactic tags, and the more specific ones those regarding information structure and semantic classification of nouns.
The recordings are being transcribed, and the current size of the corpus is around 100.000 tokens. The texts are annotated with part-of-speech and morphosyntactic tags using standard Serbian tagger created within ReLDI project (Ljubešić et al 2016), which will be adjusted for this dialect specifically. Together with information structure and semantic annotation the corpus will enable precise quantitative analysis.


Belić, A. (1905). Dijalekti Istočne i Južne Srbije. Beograd: Srpska kraljevska akademija.
Laury, R. (1997). Demonstratives in interaction: the emergence of a definite article in Finnish. Amsterdam: Benjamins.
Mišeska-Tomik, O. (2006). Balkan Sprachbund morpho-syntactic features. Dordrecht: Springer.
Ljubešić N., Klubička F., Agić Ž, Jazbec I. (2016). New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). Portorož, Slovenia.
Sobolev, A. (1998). Sprachatlas Ostserbiens und Westbulgariens. Marburg/Lahn: Biblion-Verlag.


Supervisors: Barbara Sonnenhauser, Tanja Samardžić

Funding Source: Swiss Government Excellence Scholarships for Foreign Scholars and Artists


2016 - Present Univeristy of Zurich, Slavisches Selinar, CorpusLab

Doctoral candidate
Research topic: Corpus-based Study of Post-positive Articles in Torlak (Mentors: Barbara Sonnenhauser and Tanja Samardzic)

2013 – 2015 University of Belgrade, Faculty of Philology

MA in Corpus Linguistics
Thesis: Creating a Model for a Dialectological Corpus of the Bunjevac Dialect (Mentor: Maja Miličević) 

2009 – 2013 University of Belgrade, Faculty of Philology

BA in General Linguistics

2005 – 2009 Philological High School, Belgrade

English Language and Literature

Projects and previous positions

September 2015 – Present  Protecting immaterial heritage of the Torlak vernacular (The Institute for Balkan Studies of Serbian Academy for Sciences and Arts, Museum of Ethnography in Belgrade) The project has the goal of protecting and preserving the endangered Torlak vernacular by collecting linguistic samples of it in audio and video format and presenting them through a web page with various search options. I am one of the researchers in the field, and a corpus linguistics specialist in charge of creating an Internet archive of the Torlak vernacular. (Project leader: Biljana Sikimić)
July 2015 – Present Constructing Narratives (The Institute for Balkan Studies of SASA and The Institute for Slavic Studies of The Humboldt University in Berlin) The goal of the project is the creation of a corpus of oral narratives in the light of Constructive grammar theory. My role is to provide transcription training in EXMARaLDA, and supervise the process of transcription. I am also involved in the annotation process and the creation of the corpus. (Project leaders: Christian Voss, Marija Mandic, Philipp Wasserscheidt)
May 2015 – Present Webdict – Dictionary of the Shokac dialect (The Institute for Balkan Studies of SASA) Editor and co-author of an online dictionary of the Shokac dialect (Project leader: Biljana Sikimić)
2012 - Present Webdict – Dictionary of the Bunjevac dialect (The Institute for Balkan Studies of SASA) Webpage
2012 - Present External associate at The Institute for Balkan Studies of The Serbian Academy for Sciences and Arts (Fields of research: corpus linguistics, lexicography, field linguistics)
November 2011  Threads - Project assistant (Short term development project “Threads“ (2011) for children living and working on the streets of Belgrade (independent project of four University of Belgrade students in cooperation with the Center for Youth Integration and the American Embassy in Belgrade))