The ArchiMob corpus

map
map

The ArchiMob corpus represents German varieties spoken on the territory of Switzerland. It is the first electronic resource containing long samples of transcribed text in Swiss German, intended to be used for studying spatial distribution of morphosyntactic features and for natural language processing. The size of the current version of the corpus is 528 381 tokens.

 

 

Details of the corpus composition, formatting, and annotation  can be found in the ArchiMob Release 1 Documentation (PDF, 317 KB).   

Access

This  corpus is available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. If you wish to use the corpus for commercial purposes, please contact us.

Publications

Samardžić, T., Y. Scherrer, E. Glaser (2016) “ArchiMob - A Corpus of Spoken Swiss German”. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia.

Samardžić, T., Y. Scherrer, E. Glaser (2015) "Normalising orthographic and dialectal variants for the automatic processing of Swiss German", In Proceedings of the 7th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznan, Poland.

 

 

Map by Yves Scherrer