The dictionary described here maps Standard German words to Swiss German pronunciations and spontaneous writings. It includes a total of 11'248 standard German words and their representations in six Swiss regional varieties: Zurich, Basel, Bern, Visp, and Stans. Each regional variety is represented in two ways: a) as it is pronounced (SAMPA annotation) and b) as it is typically written in a non-standard, spontaneous fashion. The non-standard writing is partly generated manually by native speakers and partly automatically (using character-level sequence-to-sequence methods).
This dictionary was compiled within a research service provided from the University Research Priority Program (URPP) ’Language and Space’ to the Swisscom Company. Contributors from the University were three students Raphael Tandler, Alina Mächler, Larissa Schmidt. They were under the supervision of Dr. Tanja Samardžić (Language and Space Lab, Text Group Leader). Collaborators from Swisscom AG were Lucy Linder, Sandra Djambazovska, Alexandros Lazaridis, supervised by Dr. Claudio Musat (Director of Research, Data, Analytics & AI).
This corpus is available upon request under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Release 1 (2020)
Details of the dictionary's composition, formatting, and annotation can be found in the following two reports:
1st Report, Oct. 2018 - Jan. 2019: Mapping Standard German to Swiss German Pronunciations (PDF, 3722 KB)
The data set is distributed by Swisscom AG (contact details on the right).
Schmidt, Larissa, Linder, Lucy, Djambazovska, Sandra, Lazaridis, Alexandros, Samardžić, Tanja, Musat, Claudiu (forthcoming): "A Swiss German Dictionary: Variation in Speech and Writing", In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020). Marseille, France.
Map by Larissa Schmidt and Yves Scherrer