Skip to content

Creating the Intertextual Networks Genre Taxonomy

By Kenny Oravetz, NULab Fellow

As part of my work as a NULab fellow during the first year of the PhD program, I was given the opportunity to conduct digital research affiliated with any of the projects under the NULab umbrella. I decided to affiliate myself with the Women Writers Project (WWP) because their stability, infrastructure, and familiar leadership all ensured a smooth entry point for me as someone new to digital humanities techniques. Through the WWP, I was able to conduct research in genre and classification, an area of interest for me, while building my skills as a digital researcher, making their collection more useful via additional metadata.

I joined the WWP in the middle of the Intertextual Networks project, an effort to create a comprehensive bibliography of the works cited, quoted, and alluded to by the authors in the main WWP collection of pre-Victorian women’s writing. Following my aforementioned interests in genre classification, I decided to create a taxonomy of genres for the cited texts in the Intertextual Networks bibliography. The end result is a list of genres oriented toward the collection and a resulting classification of each text with the genre or genres it contains and the gender of its author. This labeling exercise opened the door to analyzing statistics of the bibliography, such as which genres WWP authors most often cite and which genres correlate with which author genders, in order to draw broader conclusions about early women’s writing. Furthermore, the genre taxonomic process was also by itself a productive exercise, granting broadly applicable insights into classification methodology, the limits of genre labeling for early modern texts, and the tangential benefits of labeling archives, including introducing new researchers such as myself to digital archives and research.

The first step of the project was to create an initial list of genres the cited texts could be sorted into. I worked with Intertextual Network program leaders Sarah Connell and Ashley Clark to develop a list of genres based on our prior knowledge of the bibliography and the time period it covers, brainstorming genres relevant to the period. I then dove into the bibliography, researching texts at random intervals to find potential genres we may have missed, which were then added to our list. In this phase and those following, our discussion of the Intertextual Networks genre taxonomy hinged on two main issues: specificity and time period. These issues had to be resolved because each had the potential to make the taxonomy inconsistent and unreliable for large-scale analysis if they were not.

Both issues were strongly tied to issues of unfindability; many of the works cited in the bibliography have been lost to time, or would be too costly to track down within our resources. In these cases, all I had to go off of while classifying was the title of the work and any information about it mentioned by other authors. The large number of these works makes it largely impossible to consistently use specific sub-genre labels; any consistency would be thwarted by the likelihood that texts that by rights belong within a sub-genre lack the surviving reference information to make that classification. For example, I came across multiple unfindable collections of poetry entitled simply “Poems,” which could have been any poetic genre. If I had been labeling sonnets, and it turned out that this collection was entirely sonnets, the number of “sonnets” in the collection would have been wrong. Thus, rather than specifying between subgenres like “sonnets,” “odes,” and “elegies,” we opt to use the label “poetry.” The same logic holds true for our broad categories like “novel” and “theology,” which covers prayer books, commentaries, sermons, and more.

However, there are two exceptions to this broad level of specificity. The first exception is the “drama” category, which is specified into “comedy” and “tragedy.” More plays in the collection are readily accessible in the present day than other genres, allowing those plays to be easily investigated for genre information. Many plays from the time period also specify whether they are a comedy or tragedy within their title, making labeling straightforward. The second exception is the “scientific-writing” category. Based on the time period of the collection, we initially thought this category would be precise enough, but, once I began diving into the bibliography, I found scientific works could be easily and accurately divided into disciplines like “psychology,” “geology,” and “mathematics,” based on their extended titular information. Finally, as this set of categories indicates, we decided to develop a textual taxonomy that is flexible in labeling texts on the basis of both their genres and on their primary subject areas (that is, we tagged texts in both the genre of tragedy and the subject of mathematics). We wanted to make it possible for people to locate texts in the bibliography based a range of criteria, without being overly concerned about policing the boundaries of what does or does not constitute a genre. For the purposes of simplicity, I will refer to “genres” below,  but more properly, the taxonomy we developed identifies texts by both their genres and their subjects.

This specification of “scientific-writing” into multiple categories raises questions about our contemporary relationship to the intertextual collection. Do I see these scientific works as belonging to separate genres and necessitating their own genre labels because of the contemporary prevalence of the sciences? Would readers, writers, and scholars of the time consider the theological texts I lump together as more deserving of separate genre categorization than these scientific writings? In other words, the scientific-writing category forced me to wrangle with whether the genre labels were something placed upon the collection via the conventions of my own time, or if they were true to the conventions of the early modern period. I did not arrive at a definite conclusion; because so many theological texts were unfindable, labeling theological subgenres was impossible, regardless of any historical imperative for it.

Unfindable texts also strain the genre project in other ways. A few dozen texts had to be labeled “unknown,” since there wasn’t enough information available to discern what they were. While this may lead to some genres being statistically under-represented, there is no efficient way within our resources to label these texts, so we have no other option.

Below are the lists used and produced from this genre project. The first is the list of genres as it was when I began labeling, and the second is the list of new additions to the final taxonomy, completed once I had labeled all of the entries in the bibliography. Genres are only included in the taxonomy when more than one text falls under their label. Otherwise, texts are labeled as “other-fic” or “other-nonfic.” For instance, James VI’s “Counterblaste to Tobacco” was the sole text on drug regulation, so it was categorized as “other-nonfic.”

Initial Genres (Alphabetized)Added Genres (In order of addition)

I draw several insights from these lists:

  • Many of the added genres deal with specific scientific and socio-scientific disciplines, such as zoology, geology, and anthropology, among others. These genres are often stated explicitly in the title of their works, making such categorization possible, and occur often enough to merit their own labels. We didn’t initially include these genres because we weren’t sure if they were present; in an early draft of our taxonomic planning, I wrote that we should “add other disciplines as they appear.”
  • Many of the added genres reflect that the collection is of early women’s writing, as they deal with “feminine” subjects of the time, such as “education,” “gardening,” “childcare,” and “cookbook.” While these genres were too specific to have been thought of in our initial brainstorm, they occur regularly in the bibliography, perhaps due to its focus area.
  • Two genres were added both because of the feminist goals of the WWP and their regular occurrences: “gender-commentary,” and “gender-addressed,” the latter being any work that addresses a specific gender.
  • Other added genres are less common today, and so in a sense had to be “rediscovered” or remembered. These include “reference,” often forgotten in the internet age, “conduct-manual,” and “compendium.” “Chorography,” was in the initial list only because it popped up in one of the pre-list investigations of a portion of the bibliography.

Overall, there are multiple takeaways from the creation of this taxonomy that may be useful for others creating genre taxonomies for large archives. I found that the specificity of a taxonomy must be based on the accessibility of information on the texts it refers to. This specificity must be largely uniform so that analysis results are not misleading, and so that researchers do not attempt to force specific labels onto texts without specific information. Relatedly, scholars should not force labels onto genre-unknown texts, but rather accept ambiguity and mark it as such; just because an unfindable book title begins with “A History,” does not mean it is from the history genre. Wrong guesses lead to wrong results, and when a text is lost to time, there is no way to check whether an inference is correct.

Additionally, when developing the taxonomy, scholars should take into account historic accuracy when defining texts, ensuring that genres that are essentially dead in our time are still labeled, and that researchers know how to recognize such genres. The taxonomy must be flexibly expanded during the labeling exercise as new genres emerge, particularly those genres relevant to the time period of the corpus. It may also be useful to highlight genres that are related to the major goals of the archive, as I did here with “gender-commentary.”

I believe this genre labeling project and similar projects are worthwhile endeavors for four main reasons. Firstly, they permit scholars to run topical statistical analyses in their archival collections; the WWP analysis of these Intertextual Networks genres is forthcoming. Secondly, they create an additional layer of metadata for referencing, searching, and organizing the collection. Thirdly, the act of assigning genre labels to these texts is a form of proofing the archive on an intimate level. During the process, I came across and fixed many small errors that otherwise may have been overlooked. Finally, genre labeling is an excellent way to introduce new researchers (like myself) to an archive through straightforward, hands-on work. I was extremely pleased with my decision to work with the WWP because of how well it introduced me to not only digital archival work, but the larger field of digital humanities and encoding. These tangential benefits, along with the tangible classified results, cement the beneficial qualities of creating a genre taxonomy for archival collections.

More Stories

An image of the room where the NULab 2024 Spring Conference was held

NULab Spring Conference 2024 Recap

A wordcloud describing the most search words with slavery and American largest

NULab Research Project: LLMs, Literature and History


Eleven Plus: A Generation Poised