Tatyana Ruzsics (Soldatova) joined the URPP Language and Space in May 2016. She is a doctoral student under the supervision of Tanja Samardžić (co-supervision with Martin Volk and Balthasar Bickel) in the project Upstream Text Processing. Her research interests include deep learning methods for upstream NLP processing: writing normalization, lemmatization, morphological segmentation and morphological reinflection. She is working on the character-level neural machine translation methods that allow processing the information on multiple levels of text organization (characters, morphemes, words, sentences) in combination with structural information (multilevel statistical language models and recurrent neural networks, linguistic annotation) from heterogeneous resources (noisy text, dictionaries).
Table of contents
Publications
2019
T. Ruzsics, Lusetti, M., A. Göhring, T. Samardžić and E. Stark (2019). "Neural text normalization with adapted decoding and PoS features". Natural Language Engineering. 585 - 605. Cambridge University Press. Pre-print
2018
Lusetti, M., T. Ruzsics, A. Göhring, T. Samardžić and E. Stark (2018). "Encoder-Decoder Methods for Text Normalization". In Proceedings of the Workshop Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (COLING 2018). Santa Fe, New Mexico, USA, 18- 28. Association for Computational Linguistics.
2017
Ruzsics, T. and T. Samardžić (2017). "Neural Sequence-to-sequence Learning of Internal Word Structure". In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Vancouver, Canada, 184-194. Association for Computational Linguistics.
Makarov P., T. Ruzsics, and S. Clematide (2017). "Align and copy: UZH at SIGMORPHON 2017 shared task for morphological reinflection". In Proceedings of the CoNLL- SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, Vancouver, Canada, 49–57. Association for Computational Linguistics. Overall winner of task 1.
2016
Bentz, C., T. Ruzsics, A. Koplenig, and T. Samardžić (2016). "A comparison between morphological complexity measures: Typological data vs. language corpora". In Proceedings of the Workshop Computational Linguistics for Linguistic Complexity (COLING 2016). Osaka, Japan, 142-153. Association for Computational Linguistics.
Presentations
"Encoder-Decoder Methods for Text Normalization", SwissText 2018, ZHAW, Winterthur
"Morphological segmentation", March 2017, Institute of Computational Linguistics Colloquium, University of Zurich
„Morphological richness through massive parallel corpora“ with T. Samardžić, September 2016, URPP Language and Space, Second Meeting with Scientific Advisory Board, University of Zurich
Education
2016 - present |
University of Zurich, URPP “Language and Space”, Text Group PhD |
2016 |
ETH Zurich CAS in Computer Science with a focus on Information Systems |
2012 - 2015 |
ETH Zurich / University of Zurich MSc in Quantitative Finance |
2003 - 2008 |
Moscow State University MSc in Mathematics |