The ArchiMob Corpus

map
map

The ArchiMob corpus represents German linguistic varieties spoken within the territory of Switzerland. This corpus is the first electronic resource containing long samples of transcribed text in Swiss German, intended for studying the spatial distribution of morphosyntactic features and for natural language processing.

This corpus is available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Release 2 (2019)

The new version of the ArchiMob corpus is now out featuring: 

  • Newly transcribed documents (9 more than in the first release)
  • Speech-to-text alignment (contact us for audio sources)
  • Improved normalisation
  • Improved part-of-speech tagging

You can find more information on new features of the corpus in the Release 2 notes.

Access 

XML Download

Online query with NoSketch

Contact us for the audio sources,

Publications

Scherrer, Y., T. Samardžić, E. Glaser (2019). "Digitising Swiss German -- How to process and study a polycentric spoken language". Language Resources and Evaluation. (First online) 

Scherrer, Y., T. Samardžić, E. Glaser (2019). "ArchiMob: Ein multidialektales Korpus schweizerdeutscher Spontansprache". Linguistik Online98(5), 425-454. https://doi.org/10.13092/lo.98.5947

Release 1 (2016)

Details of the corpus composition, formatting, and annotation  can be found in the ArchiMob Release 1 Documentation (PDF, 317 KB).   

Access

 XML download (ZIP, 5411 KB) 

Online query with NoSketch  or ANNIS.

Publications

Samardžić, T., Y. Scherrer, E. Glaser (2016) “ArchiMob - A Corpus of Spoken Swiss German”. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia.

Samardžić, T., Y. Scherrer, E. Glaser (2015) "Normalising orthographic and dialectal variants for the automatic processing of Swiss German", In Proceedings of the 7th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznan, Poland.

DOI  https://doi.org/10.5281/zenodo.1158572

Map by Yves Scherrer