Corpus in GIScience: Going Beyond Butterfly Collecting

Workshop "Corpus in GIScience: Going Beyond Butterfly Collecting"

GIScience ConferenceMelbourne, Australia, August 28-31, 2018.

Workshop Description and Scope

With the increasing availability of unstructured and semi-structured text data in the form of user-generated content and digitized corpora, GIScience is actively exploring the potential of these data to answer questions ranging from traditional to cutting-edge. Interestingly, the word “corpus” is seldom used in this research.  “Collection” is the preferred term.  As the range of questions addressed through various corpora expands, it is high time for the GIScience community to gain an overview of emerging work as well as reflect upon a number of important corpus-related questions, such as: Aren’t we “butterfly collecting”[1] at best and being opportunistic at worst? Do concepts such as “language in use” and “representativeness” matter or should we take a pragmatic approach and only ask, “is corpus A good for task B”?

The aim of this workshop is to provide an overview of existing (types of) corpora, outline key methods and research questions addressed through text corpora in GIScience, as well as discuss the importance of aspects such as corpus characteristics or representativeness.

The scope of the workshop includes the following:

  • characteristics of existing geospatial and general corpora (including the Web as a corpus, digitized corpora, etc.)
  • corpus-building strategies and frameworks
  • making a corpus publicly available: tools and pitfalls
  • methods for spatial and thematic exploration of a corpus (geographic information retrieval and beyond)
  • development of spatial markup languages
  • approaches to annotation of large corpora (including crowdsourcing)
  • areas of application of corpus-based and –driven research in GIScience (e.g. from environmental monitoring to the investigation of variation in the use of spatial language)

[1] Chomsky’s famous critique of corpus linguistics (Chomsky, N. 1979. Language and Responsibility: Based on Conversations with Mitsou Ronat. New York: Pantheon. Translated by John Viertel, p.57)



We invite contributions discussing (early) ongoing work relevant to one or more of the topics listed above. Submissions should be between 6 - 8 pages in length, should follow the GIScience 2018 formatting guidelines and should be submitted in PDF format through EasyChair (link for submission: All submissions will be peer-reviewed by members of the program committee and accepted papers will be included in the GIScience digital conference proceedings. The deadline for submissions is April 28, 2018. Decisions regarding acceptance will be provided to authors by May 29, 2018.


Programme Committee:

Ben Adams (University of Canterbury)

Tim Baldwin (University of Melbourne)

Christophe Claramunt (French Naval Academy Research Institute)

Mauro Gaio (University of Pau and Pays de l’Adour)

Morteza Karimzadeh (Ohio State University)

Parisa Kordjamshidi (Tulane University)

Bruno Martins (University of Lisbon)

Ludovic Moncla (French Naval Academy Research Institute)

Ross Purves (University of Zurich)

James Pustejovsky (Brandeis University)

Tanja Samardžić (University of Zurich)

Thora Tenbrink (Bangor University)

Jan Oliver Wallgrün (Pennsylvania State University)


For further questions, please contact the organizers: Ekaterina Egorova (, Kristin Stock (, Lesley Stirling (