Corpus in GIScience: Going Beyond Butterfly Collecting

Call for Papers

Workshop "Corpus in GIScience: Going Beyond Butterfly Collecting"

GIScience ConferenceMelbourne, Australia, August 28, 2018.

Workshop Description and Scope

With the increasing availability of unstructured and semi-structured text data in the form of user-generated content and digitized corpora, GIScience is actively exploring the potential of these data to answer questions ranging from traditional to cutting-edge. Interestingly, the word “corpus” is seldom used in this research.  “Collection” is the preferred term.  As the range of questions addressed through various corpora expands, it is high time for the GIScience community to gain an overview of emerging work as well as reflect upon a number of important corpus-related questions, such as: Aren’t we “butterfly collecting”[1] at best and being opportunistic at worst? Do concepts such as “language in use” and “representativeness” matter or should we take a pragmatic approach and only ask, “is corpus A good for task B”?

The aim of this workshop is to provide an overview of existing (types of) corpora, outline key methods and research questions addressed through text corpora in GIScience, as well as discuss the importance of aspects such as corpus characteristics or representativeness.

The scope of the workshop includes the following:

  • characteristics of existing geospatial and general corpora (including the Web as a corpus, digitized corpora, etc.)
  • corpus-building strategies and frameworks
  • making a corpus publicly available: tools and pitfalls
  • methods for spatial and thematic exploration of a corpus (geographic information retrieval and beyond)
  • development of spatial markup languages
  • approaches to annotation of large corpora (including crowdsourcing)
  • areas of application of corpus-based and –driven research in GIScience (e.g. from environmental monitoring to the investigation of variation in the use of spatial language)

[1] Chomsky’s famous critique of corpus linguistics (Chomsky, N. 1979. Language and Responsibility: Based on Conversations with Mitsou Ronat. New York: Pantheon. Translated by John Viertel, p.57)



We invite two types of contributions:

  1. papers discussing (early) ongoing work relevant to one or more of the topics listed above. Submissions should be between 6 - 8 pages in length, should follow the GIScience 2018 formatting guidelines and should be submitted through EasyChair (link for submission: All submissions will be peer-reviewed by members of the program committee; accepted papers will be presented at the workshop and included in the GIScience digital conference proceedings. The deadline for submissions is May 13, 2018. Decisions regarding acceptance will be provided to authors by June 10, 2018.
  2. short position papers (3-4 pages) speculating about the specific nature, limitations, challenges or future of corpus-based and –driven approaches in GIScience. Submissions should be sent via e-mail to the workshop organizers (no specific format is required) by May 13, 2018. The papers will be reviewed by the organizers, decisions regarding acceptance will be provided to authors by June 10, 2018. Accepted papers will not be included in the proceedings and will be presented in the second part of the workshop, forming the basis for the discussion session.


Tentative Schedule:

9:15 Introduction. “On GIS, Corpora, and Butterflies”

Keynote. Parisa Kordjamshidi, Tulane University/Florida Institute for Human and Machine Cognition. “Corpus-based Spatial Information Extraction from Natural Language”

10:15  Research speed dating
11:00 Coffee break
11:30 Jingyi Xiao and Werner Kuhn “Thoughts on Geospatial Corpus”
12:00 Panel session “When Does a Collection Become a Corpus? When Does a Corpus Become Geospatial?” 

Lunch break


Keynote. Krzysztof Janowicz, University of California, Santa Barbara. Title tba.

15:00 Simon Clematide, Ekaterina Egorova, Isabel Meraner, Ross S. Purves, Martin Volk. “Crowdsourcing Toponym Annotation for Natural Features: How Hard Is It?”
15:30 Coffee break
16:00 Panel session “From Digital Humanities to Digital Humanitarians: Present and Future of Corpus-based and -driven Approaches in GIS” 

Summary and take home message

17:15 End


Programme Committee:

Ben Adams (University of Canterbury)

Tim Baldwin (University of Melbourne)

Christophe Claramunt (French Naval Academy Research Institute)

Mauro Gaio (University of Pau and Pays de l’Adour)

Morteza Karimzadeh (Ohio State University)

Parisa Kordjamshidi (Tulane University)

Bruno Martins (University of Lisbon)

Ludovic Moncla (French Naval Academy Research Institute)

Ross Purves (University of Zurich)

James Pustejovsky (Brandeis University)

Tanja Samardžić (University of Zurich)

Thora Tenbrink (Bangor University)

Jan Oliver Wallgrün (Pennsylvania State University)



Speaker: Parisa Kordjamshidi

Affiliation: Tulane University/Florida Institute for Human and Machine Cognition

Title: "Corpus-based Spatial Information Extraction from Natural Language"

Abstract: Natural language text is a rich resource of spatial information including geographical data. It becomes progressively important for real-world applications to be able to automatically extract this information, for example, for early detecting the location of events such as natural hazards. In this talk, I will discuss the recent research efforts on the extraction of spatial information from natural language with a machine learning perspective. I will discuss a) the recent annotation schemes such as SpatialML, Spatial Role Labeling, and ISO-space; b) the types of textual corpora that we have annotated; c) the aspects of spatial information that have been expressed in the current annotated data; d) and the type of concepts that we are able to automatically extract from text using corpus-based techniques. I will point to the state-of-the-art machine learning models that we have developed towards spatial language understanding and the current research results and challenges.


For further questions, please contact the organizers: Ekaterina Egorova (, Kristin Stock (, Lesley Stirling (