Applying tools from machine translation as well as neural network approaches, team members of the CorpusLab (one doctoral student, one MA student) will support the research team of the project What’s up, Switzerland (SNSF Sinergia project: CRSII1_160714, lead: Elisabeth Stark UZH) to normalize (i.e. “translate” on a word-by-word basis following described rules) the dialectal Swiss German data in their multilingual corpus of WhatsApp messages.
The approach will use data from another corpus, which was manually normalized (sms4science.ch) as well as from parts of the WhatsApp corpus that have already been normalized by student assistants.
The resulting additional layer of information in the corpus will be useful to researchers who want to systematically search for linguistic phenomena in the corpus, which does not show any standardized spelling in the original data. The standardized spelling will also allow for a better automated part-of-speech annotation than can be achieved with the non-standard original texts.
The co-operation runs for 6 months and is financed by the SNSF.