Dr. Tanja Samardžić

DIRECTOR, CORPUSLAB

TSphoto

RESEARCH INTERESTS

Quantitative text analysis; natural language processing for linguistic research.
Geometric approaches to language variation; micro- and macro- variation.
The interface between lexicon, morphology and syntax; lexical derivations; lexical semantics and pragmatics.

PUBLICATIONS

2017

Ruzsics, T. and T. Samardžić (2017). "Neural Sequence-to-sequence Learning of Internal Word Structure". In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Vancouver, Canada, 184-194.  bib

Bentz, C., D. Alikaniotis, T. Samardžić, and P. Buttery (2017)."Variation in word frequency distributions: Definitions, measures and implications for a corpus-based language typology'". Journal of Quantitative Linguistics 24(2-3)}, 128-162.

Samardžić, T., M. Starović, Ž. Agić, and N. Ljubešić (2017). "Universal Dependencies for Serbian in Comparison with Croatian and Other Slavic Languages". In  Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing. Valencia, Spain, 39-44. bib

2016

Ljubešić, N., Samardžić, T., and C. Derungs (2016). "TweetGeo  --- A tool for collecting, processing and analysing geo-encoded linguistic data'". In Proceedings of the 26th International Conference on Computational Linguistics (COLING2016). Osaka, Japan.

Bentz, C., T. Ruzsics,  A. Koplenig, and T. Samardžić (2016). "A comparison between morphological complexity measures: Typological data vs. language corpora". In Proceedings of the Workshop Computational Linguistics for Linguistic Complexity (CL4LC). Osaka, Japan.

Samardžić, T., Y. Scherrer, and E. Glaser (2016) “ArchiMob - A Corpus of Spoken Swiss German”. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia.

Samardžić, T., and M. Miličević (2016) “A Framework for Automatic Acquisition of Croatian and Serbian Verb Aspect from Corpora”. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia.

Ljubešić, N., T. Erjavec, D. Fišer, T. Samardžić, M. Miličević, F. Klubička, and F. Petkovski (2016). "Easily accessible language technologies for Slovene, Croatian and Serbian". In Proceedings of the Conference on Language Technologies & Digital Humanities. Ljubljana, Slovenia.

2015

 Samardžić, T., Y. Scherrer, and E. Glaser (2015) "Normalising orthographic and dialectal variants for the automatic processing of Swiss German", In Proceedings of the 7th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznan, Poland.

Samardžić, T., N. Ljubešić, and M. Miličević (2015) "Regional Linguistic Data Initiative (ReLDI)", In Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing (BSNLP 2015), Hissar, Bulgaria.

Samardžić, T., R. Schikowski, and S. Stoll (2015) "Automatic interlinear glossing as two-level sequence classification", In Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, Beijing, China, 68-72.bib

2014

Samardžić, T. and P. Merlo (2014) " Likelihood of External Causation in the Structure of Events", In Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL), Gothenburg, Sweden, 40-47. bib

Aepli, N., R. v. Waldenfels, and T. Samardžić (2014) "Part-of-Speech Tag Disambiguation by Cross-Linguistic Majority Vote ", First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial), COLING 2014, Dublin, Irlande.bib

2013

Samardžić, T. and M. Miličević (2013) "Constructing a learner-friendly corpus-based dictionary of Serbian verbal aspect", Primenjena lingvistika 14, 77-89.

2012

Samardžić, T. and P. Merlo (2012) “The Meaning of Lexical Causatives in Cross-Linguistic Variation“, Linguistic Issues in Language Technology 7/12, CSLI Publications, 1-14.

Gesmundo, A. and T. Samardžić (2012) “Lemmatisation as a Tagging Task“, In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Jeju Island, Korea, 368-372. bib

Gesmundo, A. and T. Samardžić (2012) “Lemmatising Serbian as Category Tagging with Bidirectional Sequence Classification“, In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey.

OLDER

Conference presentations

Samardžić, T. (2017) "Basic natural language processing for Swiss German texts", SwissText 2017 video

Simonović, M. and T. Samardžić (2013) "Exponence, productivity and default pattern - A study of verb aspect in Serbo-Croatian", International Congress of Linguists (ICL), Geneva, Switzerland.

Samardžić, T. (2009) “Light verbs and the lexical category bias of their complements”, Aston Corpus Conference for Postgraduate Researchers, Aston University, Birmingham, UK.

Theses

"Dynamics, causation, duration in the predicate-argument structure of verbs: A computational approach based on parallel corpora", PhD dissertation, University of Geneva.

"Semantic roles in natural language processing and in linguistic theory", Predoctoral thesis, University of Geneva.

“Light verbs and the lexical category bias of their complements”, DEA thesis, University of Geneva.

“Reflexivization of transitive three valence verbs in novištokavski standard language diasystem”, MA thesis, University of Belgrade (in Serbian)

TEACHING

Current

2014 – present Institute of Computational Linguistics, University of Zurich:
Techniques of semantic processing (MA) OLAT
Cross-linguistic transfer of lexical semantic representations (MA) OLAT

Former

2014 – 2015 Institute of Slavic Languages, University of Bern:
Automatic analysis of the languages of former Yugoslavia (BA)
2012 – 2013 Linguistics Department, LATL, University of Geneva:
Empirical methods and script languages (Python) (MA)
Artificial intelligence (BA)
2004 – 2012 Department of General Linguistics, University of Belgrade:
Introduction to General Linguistics (BA)
Applied linguistics (Awk) (BA)
Introduction to mark-up languages (BA)
Discourse Analysis (BA)
Pragmatics (BA)
Methodology of linguistic research (BA)
2000 – 2004 Department of Serbian Language, University of Belgrade:
Contemporary Serbian III — Syntax (BA)
Computational and mathematical linguistics (BA)

GRANTS AND SCHOLARSHIPS

2016 – 2017 Hasler foundation grant 16038 Basic Natural Language Processing for Swiss German Texts (PI)
2015 – 2017 SNSF grant 160501 Regional Linguistic Data Initiative (PI)
2008 Scholarship of the Department of General Linguistics, University of Geneva
2006 – 2008 Scholarship of the Swiss Federal Commission for Foreign Students
2002 Sasakawa scholarship for young leaders, realized at the universities of Birmingham, Duisburg, and Belgrade
2000 Serbian Ministry of Science and Technology research scholarship, realized at the Institute for Serbian Language
1995 – 1999 Serbian Ministry of Education students’ scholarship

PREVIOUS POSITIONS

2013 – 2014 Editorial assistant, Computational Linguistics, Association for Computational Linguistics
2009 – 2013 Research assistant in Computational Linguistics, Department of General Linguistics, University of Geneva
2004 – 2013 Teaching assistant in Computational Linguistics and Applied Linguistics, Department of General Linguistics, University of Belgrade
2000 – 2004 Teaching assistant in Syntax and Computational Linguistics, Department of Serbian Language, University of Belgrade
2002 – 2004 Member of the team for developing Serbian Language educational curriculum, Ministry of Education of Serbia
2000 Research assistant in Lexicography, Institute for Serbian Language, Belgrade
1995 – 1999 Assistant editor of the Petnica Science Centre Linguistics Edition
1994 – 1999 Teaching instructor, Linguistics, Petnica Science Centre, Valjevo

EDUCATION

2008 – 2013 PhD in Computational linguistics, University of Geneva
Thesis: Dynamics, causation, duration in the predicate-argument structure of verbs: A computational approach based on parallel corpora. Supervised by Prof. Paola Merlo.
2006 – 2008 Postgraduate Studies in Computational linguistics, DEA, University of Geneva
Thesis: Light verbs and the lexical category bias of their complements. Supervised by Prof. Paola Merlo.
1999 – 2004 Graduate Studies in Linguistics, MA Degree, University of Belgrade
Thesis:Reflexivization of transitive three valence verbs in novištokavski standard language diasystem. Supervised by Prof. Ljubomir Popović. (in Serbian)
1994 – 1999 Diploma in Serbian Language, Literature and Linguistics, Faculty of Philology, University of Belgrade

COMPUTER SKILLS

Programming: Python, Perl, Awk
Mark-up: XML, XHTML, LaTeX
Shell: Unix/Linux
Data analysis: R, Python

LANGUAGES

Serbian (native), English, French (fluency), German, Italian (medium), Slovenian (passive)