Nachwuchsgruppe "Hybride Narrativität" – Digitale und kognitive Methoden zur Erforschung grafischer Literatur

The Graphic Narrative Corpus (GNC)

This corpus collects 253 graphic narratives written in English and published in the United States, Great Britain, Canada, and India. The Graphic Narrative Corpus (GNC) was conceived for a research project that brought together literary scholars with cognitive and computer scientists at the universities of Paderborn and Potsdam from 2014 to 2020, and was funded by the German Federal Ministry of Education and Research (BMBF) under the title “Hybrid Narrativity.” The datasets that can be downloaded on this site make the corpus metadata available to an interested public. Given existing copyright laws in the European Union, we are unable to share the scans of these graphic narratives produced for the project, as well as full-length texts extracted from them, the results of eyetracking studies, and further computational models. Work on this data continues, and we hope to make some of it available to other researchers in the future. Please get in touch with members of the project team, if you are interested in further research on aspects of the data presented on this site.

In the definition used for corpus collection, graphic narratives refer to book-length comics that exceed 64 pages in length, tell one continuous or closely related stories, are aimed primarily at an adult readership, and form one single volume or a limited series (such as a trilogy). Included are fictional and non-fictional texts, such as graphic novels and memoirs, graphic journalism, and what we refer to as graphic fantasy, including comic books that belong to the superhero and science fiction genre. Historically, the GNC stretches from the mid-1970s, when the graphic novel started coming into its own in English, to 2017. For several reasons to do with the pop-cultural status of graphic narrative, it's currently almost impossible to know how many graphic narratives were published in this time period. Therefore, the project team drew on a wide range of sources in constructing the GNC. These are: international comics prizes (Eisner, Ignatz, Harvey, and the British Comics Award), academic databases (JSTOR and MLA), Amazon.com bestseller lists, online bibliographies (Grand Comics Database, Comicsvine) and library collections (Library of Congress, the Billy Ireland Cartoon Library at Ohio State University), literary histories and international comics experts, as well as newspaper articles (Guardian, Time, etc.). By casting our net widely, we aimed to balance popularity and prestige and to offset the biases of individual sources. More information on sources can be found in the corpus metadata.

In addition, this homepage offers selected graphs that aggregate and visualize some of the metadata. The Books and Authors sections allow you to explore the corpus based on individual titles and authors and includes basic biographical and bibliographical information. The Graphs section presents interactive maps, charts, and historical overviews of the publication format of graphic narrative. You can look at the geographical spread of this form, explore differences by gender, or see which subgenres dominate the corpus. Further information about the sampling process for the corpus, some of the computational methods of text and image analysis developed for the project, and results that offer a new theoretical understanding and historical periodization of graphic narrative can be found in a monograph authored by Alexander Dunst, titled The Rise of the Graphic Novel: Computational Criticism and the Evolution of Literary Value (Cambridge UP, 2023). The project page also includes a publication list that covers several aspects of the work done by the research group, across cognitive science, computer science, the digital humanities, and literary studies.