GerManC: A representative historical corpus of German 1650-1800

This project aims to compile a representative historical corpus of written German for the years 1650-1800. This is a crucial period in the development of the language, as the modern standard was formed during it, and competing regional norms were finally eliminated.

The central aim is to provide a basis for comparative studies of development of grammar and vocabulary of English and German and the way in which they were standardised. It’s intended as a research resource for a number of disciplines.

The structure and design of the GerManC corpus is intended to parallel similar historical linguistic corpora of English, notably the ARCHER corpus and the Helsinki corpus of English texts.

Project details

The ARCHER corpus is being developed by an international team of scholars, including Professor David Denison and Dr Núria Yañez Bouza in Linguistics and English Language in Manchester, and the GerManC project collaborated with them in order to assure a maximum degree of comparability between the corpora.

The idea for the project goes back to an initiative by Anita Auer (now at the University of Utrecht), who completed a doctorate in Manchester on in 2005. Dr Auer’s work drew attention to the lack of corpus-based data for German during this period compared to English. She suggested undertaking the compilation of such a corpus for German and completed work on it.

In the development of the GerManC corpus, consistent attention was paid to maintain compatibility with corpus projects in Germany, covering historical stages of German, initially within the framework of the DDD project (Deutsch Diachron Digital), and latterly with the various parts of Historisches Referenzkorpus des Deutschen, which are being compiled at various centres in Germany.

Following the model of the ARCHER corpus, and given the aim of representativeness, the GerManC corpus consists of text samples of about 2000 words from eight genres: drama, newspapers, sermons and personal letters (to represent orally oriented registers) and narrative prose (fiction or non-fiction), scholarly (i.e. humanities), scientific and legal texts (to represent more print-oriented registers). In order to facilitate tracing historical developments, the whole period was divided into fifty year sections (in this case 1650-1700, 1700-1750 and 1750-1800), and an equal number of texts from each genre was selected for each of these sub-periods. This periodisation follows the model established for the Bonn corpus of Early New High German, which is being updated as part of the Historisches Referenzkorpus des Deutschen. Given the areal diversity of German during this period, the corpus aimed for representativeness in respect of region, and to this end, broad regional divisions were adopted for the GerManC corpus, i.e. North German, West Central German, East Central German, South-West German (including Switzerland) and South-East German (including Austria), taking an equal number of texts for each genre and sub-period from these five regions).

An initial stage, the GerManC pilot project comprising newspaper texts only, supported by the Economic and Social Research Council from 1 March 2006 to 31 March 2007 (grant no. RES-000-22-1609), with Professor Martin Durrell as Principal Investigator, Dr Paul Bennett as Co-Investigator, and Dr Astrid Ensslin as Research Associate. Dr Ensslin left at the end of the pilot project to take up a post at Bangor University, where she is Senior Lecturer in Digital Humanities.

Following a positive evaluation of the pilot, the full project was approved for support by the ESRC jointly with the AHRC in early 2008 (grant no. RES-062-23-1118). Work started in September 2008 with Dr Silke Scheible and Dr Richard J. Whitt joining Professor Martin Durrell and Dr Paul Bennett as Research Associates, and ended in August 2012 with the completion of the projected corpus, which can be downloaded from the Economic and Social Data Service Data Archive and the Oxford Text Archive. Dr. Whitt has subsequently worked for the project Visualizing English Print from 1470-1800 under the direction of Professor Joanthan Hope at the University of Strathclyde, and Dr. Scheible is working under the direction of Dr Sabine Schulte im Walde on the project Distributional Approaches to Semantic Relatedness Institut für Maschinelle Sprachverarbeitung (IMS) Universität Stuttgart.