Graphs in Romanticism Research

We are all aware of the hand-wringing that accompanies humanities scholarship in the early 21st century. Soon enough there will be another article announcing the death or worthlessness of the humanities degree. Subsequently there will be a rebuttal which points out how crucial the humanities are. And the cycle will continue. I am not trying to disparage that particular discussion, but I want to point it out as a symptom of the larger problem of how the humanities interface with the public. According to the public, there does not seem to be anything concrete that the humanities produce; of course that is not true, but it is hard to overcome that perception. One of the ways of overcoming that perception might be to offer alternative perspectives on our data. To that end, I want to further consider the graph, as a way of helping further humanities research. I will say that the goal here is to continue the discussion about whether or not the graph as a research tool can be useful for Romanticism; I am not sure the graph will be useful, but to understand the advantages and pitfalls of a new methodology we will need to have the discussion first.

One last item, before we go too much further: I would be remiss not to note Graphs, Maps, and Trees: Abstract Models for Literary History by Franco Morretti. That book and its various responses really started this particular conversation.  I hope to focus the conversation on particular tool though, which is the Google Ngram Viewer. As you all are aware, the Ngram Viewer uses Optical Character Recognition (OCR) to search through Google’s database of digitized books. The Ngram Viewer is not perfect, to say the least. For example, it frequently confused the long ‘s’ as an ‘f’ up until recently. That being said, the Ngram Viewer does have some powerful tools available, not only allowing you to search for various words, but also parts of speech, most popular following words, and so on.

Here is a graph that charts the ‘Big Six’ from 1789 until 1912:

If you would like to see the original graph it his here: Blake,Wordsworth,Coleridge,Byron,Shelley,Keats Original. Among other items, this graph can tell us a few items: That Blake started off as the most popular, but that Lord Byron was the most popular of all of the six throughout the long nineteenth century, although there were a few moments where Shelley, Wordsworth, and even Coleridge over took him. And that Keats … was not quite as popular.

Or, at least it would be nice if the graph told us that. Due to the way that the OCR works, though, any mention of the words are gathered. So that a search for ‘Shelley’ will collect not only Percy, but also Mary, and their children, and extended family, or just anyone else named Shelley. Names that are a bit more unique, like Wordsworth and Byron, probably are closer to representing the writers I was looking for. But those searches will still gather information from other Lord Byrons and other Wordsworths, like Dorothy. For the purpose of searching for proper-nouns, the more unique the better. For example, here is a graph of more unique book titles:

The original graph is here: Pride and Prejudice,The Bride of Lammermoor,Mansfield Park,Frankenstein,Sense and Sensibility,The Last Man,Guy Mannering Graph. This data is a bit more valuable, because the likelihood of someone writing ‘Frankenstein’ or the words ‘Pride and Prejudice’ together, to refer to something else than the books  is smaller (although not impossible). Noticeably, Sir Walter Scott’s  novel is quite popular, although so too is Shelley’s. And, admittedly, if I extended the graph into the 20th century, Jane Austen’s novels would be more prominent.

The Ngram Viewer can also do a wildcard search, which I did with the word ‘French’ below:



Again, here is the original: French * Graph. At least for the time limit, the most frequent word to follow the word ‘French’ is ‘and’.  That result is not particularly surprising, though, as and is a fairly common word. What did surprise me was that between 1812 and 1818 ‘army’ followed ‘French’ more frequently than ‘and’. Of course, Napoleon was attempting to conquer the rest of Europe during that phase of time (minus Elba). But I think that the concern or interest in the French army was so great that it surpassed an everyday usage is interesting. If someone were writing how, in a particular text, one can see the anxiety over the French army, this graph might help them reinforce their point.

I would also like to point out the “Search in Google Books” section. If you were to click on any of those date ranges, Google would take you to the books where it found the word in question. Also, that search section can show what kind of results the search is generating, whether Blake refers to William, or other Blakes.

Although this is a brief meditation, I think that there are a few items that I would like focus on. First off, I think it it plain that these graphs are no substitute for the closer readings that people in the humanities often perform. And there are problems with the graphs, they cast a net that is a bit too wide. There is though a few interesting advantages, like these graphs can help show very large historical shifts. The viewer can also help with a very formalist study, because of its ability to parse words (which I did not touch on here).  But for the moment, I think that the very broad perspective of the Ngram Viewer might be useful to humanities research, in that it would help us illuminate historical trends just a little bit better.  Graphs, and the Ngram Viewer tool, are certainly not perfect nor can they replace our normal methodologies, but they do have some potential for humanities research.

-Kent Linthicum