Tatyana Ruzsics

MA Tatyana Ruzsics

Doctoral student at the CorpusLab

Address: Freiestrasse 16, 8032 Zürich

Room number: FRF E 1

tatiana.ruzsics@uzh.ch

Tatyana Ruzsics (Soldatova) joined the URPP Language and Space in May 2016

PhD project

Morphological typology through massive parallel corpora

This PhD project addresses the notion of morphological richness of languages in a large-scale morphological typological analysis using massively parallel corpora. Morphologically rich languages express multiple levels of information already at the word level and thus they are expected to have a higher level of word types variations and in turn, low frequency of word types. Therefore, measures based on distribution of word types can differentiate between morphologically rich and morphologically poor languages. However, distribution of word types is only a partial indicator since it does not distinguish between morphological and lexical diversity. On the other hand, a comparison based on word alignments, i.e. how many words in one language correspond to a word type in another language, is expected to distinguish between these two types. Given that the word boundaries is uncertain phenomena, monolingual tests concerning different definitions of words will be performed for a subset of languages.

The project will further focus on the variation in space of the obtained morphological richness structure. Geographical distribution for measures of similarities and differences between languages is one of the objectives of contemporary typology. Thus, the proposed research will provide tools and materials for addressing language contact effects and for potential further investigations of language evolution. The use of corpora will serve as a valuable contribution to this research field since most of the work is currently based on grammars.

The main research questions can be therefore expressed as:

  1. How languages are distributed on a morphological richness scale based on corpora?
  2. How is morphological richness distributed in geographical space?

Supervisor: Tanja Samardžić, Balthasar Bickel, Martin Volk

Funding source: URPP Language and Space

Publications

2017

Ruzsics, T. and T. Samardžić (2017). "Neural Sequence-to-sequence Learning of Internal Word Structure". In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Vancouver, Canada, 184-194.

Makarov P., T. Ruzsics, and S. Clematide (2017). "Align and copy: UZH at SIGMORPHON 2017 shared task for morphological reinflection". In Proceedings of the CoNLL- SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, Vancouver, Canada, 49–57. Association for Computational Linguistics. 

2016

Bentz, C., T. Ruzsics,  A. Koplenig, and T. Samardžić (2016). "A comparison between morphological complexity measures: Typological data vs. language corpora". In Proceedings of the Workshop Computational Linguistics for Linguistic Complexity (CL4LC). Osaka, Japan, 142-153. The COLING 2016 Organizing Committee. 

Presentations

"Morphological segmentation", March 21, 2017, University of Zurich, Institute of Computational Linguistics Colloquium

Education

2016 - present

University of Zurich, Corpus Lab, URPP “Language and Space”  

PhD in General Linguistics  

Research topic: "Morphological typology through massive parallel corpora"

2016

ETH Zurich

CAS in Computer Science with a focus on Information Systems

2012 - 2015

ETH Zurich / University of Zurich

MSc in Quantitative Finance

2003 - 2008

Moscow State University 

MSc in Mathematics