Table of contents
I am a computational linguist with a background in language theory and machine learning. My research is about developing computational text processing methods and using them to test theoretical hypotheses on how language actually works. I hold a PhD in Computational linguistics from the University of Geneva, where I studied in the group Computational Learning and Computational Linguistics (CLCL). I am committed to promoting and facilitating the use of computational approaches in the study of language.
Moran S., C. Bentz, X. Gutierrez-Vasques, O. Sozinova and T. Samardzic (2022). "TeDDi Sample: Text Data Diversity Sample for Language Comparison and Multilingual NLP". In Proceedings of The International Conference on Language Resources and Evaluation (LREC), Marseille, France, 1150–1158.
Samardžić, T. and N. Ljubešić (2021). "Data Collection and Representation for Similar Languages, Varieties and Dialects". In M. Zampieri and P. Nakov (eds.) Similar Languages, Varieties, and Dialects: A Computational Perspective, Studies in Natural Language Processing. Cambridge University Press Pre-print
Ruzsics, T., O. Sozinova, X. Gutierrez-Vasques and T. Samardžić (2021). "Interpretability for Morphological Inflection: from Character-level Predictions to Subword-level Rules". In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online, 3189–3201.
Gutierrez-Vasques, X., C. Bentz, O. Sozinova, and T. Samardžić (2021). "From characters to words: the turning point of BPE merges". In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online, 3454–3468.
Nigmatulina, I., T. Kew, T. Samardžić (2020). "ASR for non-standardised languages with dialectal variation: the case of Swiss German". In Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial2020), COLING 2020 Barcelona, Spain.
Kew, T., I. Nigmatulina, L. Nagele, T. Samardžić (2020). "UZH TILT: A Kaldi recipe for Swiss German speech to standard German text". In Proceedings of the 5th Swiss Text Analytics Conference (SwissText) & 16th Conference on Natural Language Processing (KONVENS). Zurich, Switzerland.
Schmidt, L., L. Linder, S. Djambazovska, A.Lazaridis, T. Samardžić, C. Musat (2020). "A Swiss German Dictionary: Variation in Speech and Writing". Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). Marseille, France.
Ruzsics, T., M. Lusetti, A. Göhring, T. Samardžić, and E. Stark (2019). "Neural text normalization with adapted decoding and PoS features". Natural Language Engineering 25(5), 585-605. Pre-print
Scherrer, Y., T. Samardžić, E. Glaser (2019). "Digitising Swiss German -- How to process and study a polycentric spoken language". Language Resources and Evaluation 53, 735-769.
Ljubešić, N., M. Miličević Petrović, and T. Samardžić (2019). "Borders and boundaries in Bosnian, Croatian, Montenegrin and Serbian: Twitter data to the rescue". Journal of Linguistic Geography 6(2), 100-124. Pre-print
Ljubešić, N., M. Miličević Petrović, and T. Samardžić (2019). "Language accommodation on Twitter: The case of Serbian". Slavistična revija 67(1), 87-106. (In Croatian)
Samardžić, T. and P. Merlo (2018). "Probability of external causation: an empirical account of cross-linguistic variation in lexical causatives". Linguistics 56(5), 895-939. Pre-print (PDF, 2873 KB).
Lusetti, M., T. Ruzsics, A. Göhring, T. Samardžić, and E. Stark (2018). "Encoder-decoder methods for text normalization". In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), COLING 2018, Santa Fe, NM, USA, 18-28. bib
Samardžić, T., M. Cieliebak, and J. M. Deriu (2018). "Future Actions for Swiss German — Workshop Results at SwissText 2018". In Proceedings of the 3rd Swiss Text Analytics Conference (SwissText 2018), Winterthur, Switzerland, 95-99.
Zampieri, M. S. Malmasi, P. Nakov, A. Ali, S. Shon, J. Glass, Y. Scherrer, T. Samardžić, N. Ljubešić, J. Tiedemann, C. van der Lee, S. Grondelaers, N. Oostdijk, A. van den Bosch, R. Kumar, B. Lahiri, and M. Jain (2018). "Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign". In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), COLING 2018. Santa Fe, NM, USA, 1-17. bib
Batanović, V., N. Ljubešić, and T. Samardžić (2018). "SETimes.SR – A reference training corpus of Serbian". In Proceedings of the Conference on Language Technologies & Digital Humanities 2018, Ljubljana, Slovenia, 11-18.
Vuković, T. and T. Samardžić (2018). “Areal distribution of the post-positive article in Timok dialect of Torlak”. In Timok: Field Research in Folklore and Language 2015-2017, Knjaževac, Serbia: Public Library Knjaževac, 181-201 (In Serbian).
Ruzsics, T. and T. Samardžić (2017). "Neural sequence-to-sequence learning of internal word structure". In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Vancouver, Canada, 184-194. bib
Derungs, C. and T. Samardžić (2017). "". International Journal of Geographical Information Science 32(5), 856-873. free eprint
Bentz, C., D. Alikaniotis, T. Samardžić, and P. Buttery (2017)."Variation in word frequency distributions: Definitions, measures and implications for a corpus-based language typology'". Journal of Quantitative Linguistics 24(2-3), 128-162.
Samardžić, T., M. Starović, Ž. Agić, and N. Ljubešić (2017). "Universal dependencies for Serbian in comparison with Croatian and other Slavic languages". In Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing. Valencia, Spain, 39-44. bib
Ljubešić, N., Samardžić, T., and C. Derungs (2016). "TweetGeo --- A tool for collecting, processing and analysing geo-encoded linguistic data". In Proceedings of the 26th International Conference on Computational Linguistics (COLING2016). Osaka, Japan.
Bentz, C., T. Ruzsics, A. Koplenig, and T. Samardžić (2016). "A comparison between morphological complexity measures: Typological data vs. language corpora". In Proceedings of the Workshop Computational Linguistics for Linguistic Complexity (CL4LC). Osaka, Japan.
Samardžić, T., Y. Scherrer, and E. Glaser (2016) “ArchiMob - A corpus of spoken Swiss German”. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia.
Samardžić, T., and M. Miličević (2016) “A framework for automatic acquisition of Croatian and Serbian verb aspect from corpora”. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia.
Ljubešić, N., T. Erjavec, D. Fišer, T. Samardžić, M. Miličević, F. Klubička, and F. Petkovski (2016). "Easily accessible language technologies for Slovene, Croatian and Serbian". In Proceedings of the Conference on Language Technologies & Digital Humanities. Ljubljana, Slovenia.
Samardžić, T. (2017) "Basic natural language processing for Swiss German texts", SwissText 2017 video
Simonović, M. and T. Samardžić (2013) "Exponence, productivity and default pattern - A study of verb aspect in Serbo-Croatian", International Congress of Linguists (ICL), Geneva, Switzerland.
Samardžić, T. (2009) “Light verbs and the lexical category bias of their complements”, Aston Corpus Conference for Postgraduate Researchers, Aston University, Birmingham, UK.
"Automatic text processing for the study of language", Habilitation thesis, University of Zurich.
"Dynamics, causation, duration in the predicate-argument structure of verbs: A computational approach based on parallel corpora", PhD dissertation, University of Geneva.
"Semantic roles in natural language processing and in linguistic theory", Predoctoral thesis, University of Geneva.
“Light verbs and the lexical category bias of their complements”, DEA thesis, University of Geneva.
“Reflexivization of transitive three valence verbs in novištokavski standard language diasystem”, MA thesis, University of Belgrade (in Serbian)
|2014 – present||
Institute of Computational Linguistics, University of Zurich:
Doctoral Programme in Applied Linguistics, ZHAW School of Applied Linguistics and the USI Faculty of Communication, Culture and Society
- Designing empirical studies in linguistics
|2020 – 2021||Institute of Computational Linguistics, University of Zurich:
- Processing non-standard language (BA,MA) UZH Course Catalogue
|2018 – 2019||Linguistics Department, Computer Science Department, University of Geneva (replacing Paola Merlo):
- Natural language processing (MA)
- Empirical methods in language processing (Neural sequence-to-sequence methods, MA)
|2019||Institute of Computational Linguistics, University of Zurich:
- Programming for linguists (Python and R) (MA) OLAT
|2019||German Department, University of Zurich:
- Automatic text processing for the study of Swiss German (BA/MA) OLAT
|2017 – 2018||Institute of Computational Linguistics, University of Zurich:
- Linked and multilingual resources (MA) OLAT
|2014 – 2017||Institute of Computational Linguistics, University of Zurich:
- Cross-linguistic transfer of lexical semantic representations (MA) OLAT
|2014 – 2015||Institute of Slavic Languages, University of Bern:
- Automatic analysis of the languages of former Yugoslavia (BA)
|2012 – 2013||Linguistics Department, LATL, University of Geneva:
- Empirical methods and script languages (Python) (MA)
- Artificial intelligence (BA)
|2004 – 2012||Department of General Linguistics, University of Belgrade:
- Introduction to general linguistics (BA)
- Applied linguistics (Awk) (BA)
- Introduction to mark-up languages (BA)
- Discourse analysis (BA)
- Pragmatics (BA)
- Methodology of linguistic research (BA)
|2000 – 2004||Department of Serbian Language, University of Belgrade:
- Contemporary Serbian III — Syntax (BA)
- Computational and mathematical linguistics (BA)
GRANTS AND SCHOLARSHIPS
|2020 – 2023||Movetia grant 2020-01MT-1-KA203074246a5 UPgrading the SKIlls of Linguistics and Language Students -- UPSKILLS (PI)|
|2018 – 2022||SNSF grant 176305 Non-randomness in morphological diversity: A computational approach based on multilingual corpora(PI)|
|2018 – 2019||Movetia grant 0012 Revisiting research training in linguistics: theory, logic, method (PI)|
|2016 – 2017||Hasler foundation grant 16038 Basic natural language processing for Swiss German texts (PI)|
|2015 – 2017||SNSF grant 160501 Regional linguistic data initiative (PI)|
|2008||Scholarship of the Department of General Linguistics, University of Geneva|
|2006 – 2008||Scholarship of the Swiss Federal Commission for Foreign Students|
|2002||Sasakawa scholarship for young leaders, realized at the universities of Birmingham, Duisburg, and Belgrade|
|2000||Serbian Ministry of Science and Technology research scholarship, realized at the Institute for Serbian Language|
|1995 – 1999||Serbian Ministry of Education students’ scholarship|
|Apr 2022||University of Helsinki, Text-based measures of language similarity|
|Feb 2022||IT University Copenhagen, Language families and similarity|
|Sep 2021||SIGTYP lecture series, Language sampling|
|Jun 2021||Mexican NLP Summer School 2021, Language (de)standardisation and NLP|
University of Milano-Bicocca, Searching for subword units in language processing and linguistic theory
University of Geneva, Interpretable word splits in language processing
University of Zurich, The impact of world knowledge on the use and the morphology of verbs
University of Munich, Verb aspect as linguistic encoding of time: a computational cross-linguistic approach
University of Geneva, The impact of geography on toponym frequency
University of Zagreb, Steps in building a corpus of Swiss German
BSNLP workshop (Hissar, Bulgaria), Aspect-based learning of event duration using parallel corpora
ACQDIV Project kickoff workshop (Kappel Abbey), The Bayesian learning framework
University of Stuttgart, Likelihood of external causation and the cross-linguistic variation in lexical causatives
|2013 – 2014||Editorial assistant, Computational Linguistics, Association for Computational Linguistics|
|2009 – 2013||Research assistant in Computational Linguistics, Department of General Linguistics, University of Geneva|
|2004 – 2013||Teaching assistant in Computational Linguistics and Applied Linguistics, Department of General Linguistics, University of Belgrade|
|2000 – 2004||Teaching assistant in Syntax and Computational Linguistics, Department of Serbian Language, University of Belgrade|
|2002 – 2004||Member of the team for developing Serbian Language educational curriculum, Ministry of Education of Serbia|
|2000||Research assistant in Lexicography, Institute for Serbian Language, Belgrade|
|1995 – 1999||Assistant editor of the Petnica Science Centre Linguistics Edition|
|1994 – 1999||Teaching instructor, Linguistics, Petnica Science Centre, Valjevo|
|2008 – 2013||PhD in Computational linguistics, University of Geneva
Thesis: Dynamics, causation, duration in the predicate-argument structure of verbs: A computational approach based on parallel corpora. Supervised by Prof. Paola Merlo.
|2006 – 2008||Postgraduate Studies in Computational linguistics, DEA, University of Geneva
Thesis: Light verbs and the lexical category bias of their complements. Supervised by Prof. Paola Merlo.
|1999 – 2004||Graduate Studies in Linguistics, MA Degree, University of Belgrade
Thesis:Reflexivization of transitive three valence verbs in novištokavski standard language diasystem. Supervised by Prof. Ljubomir Popović. (in Serbian)
|1994 – 1999||Diploma in Serbian Language, Literature and Linguistics, Faculty of Philology, University of Belgrade|
Programming: Python, Perl, Awk
Data analysis: R, Python
Mark-up: XML, XHTML, LaTeX
Serbian (native), English, French (fluency), German, Italian (medium), Slovenian (passive)