The ArchiMob corpus represents German linguistic varieties spoken within the territory of Switzerland. This corpus is the first electronic resource containing long samples of transcribed text in Swiss German, intended for studying the spatial distribution of morphosyntactic features and for natural language processing.
This corpus is available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Release 2 (2019)
The new version of the ArchiMob corpus is now out featuring:
You can find more information on new features of the corpus in the Release 2 notes.
Normalisation guidelines (in German): latest version
Online query with NoSketch
Contact us for the audio sources.
Scherrer, Y., T. Samardžić, E. Glaser (2019). "Digitising Swiss German -- How to process and study a polycentric spoken language". Language Resources and Evaluation. (First online)
Scherrer, Y., T. Samardžić, E. Glaser (2019). "ArchiMob: Ein multidialektales Korpus schweizerdeutscher Spontansprache". Linguistik Online, 98(5), 425-454. https://doi.org/10.13092/lo.98.5947
Release 1 (2016)
Details of the corpus composition, formatting, and annotation can be found in the ArchiMob Release 1 Documentation (PDF, 317 KB).
Samardžić, T., Y. Scherrer, E. Glaser (2016) “ArchiMob - A Corpus of Spoken Swiss German”. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia.
Samardžić, T., Y. Scherrer, E. Glaser (2015) "Normalising orthographic and dialectal variants for the automatic processing of Swiss German", In Proceedings of the 7th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznan, Poland.
Map by Yves Scherrer