For most of my academic career, I didn’t think much about methodology. I read, I think, I write (and rewrite, rewrite, rewrite). This changed when I took an introductory Digital Humanities course, a survey of digital tools and methods. My biggest takeaway from this course (other than that computers are frustrating) was that methodology affects not only the results of research, but also the way we think about our data and the types of questions we ask. This not a new idea for many scholars, I know, but for those of us used to the read-think-write strategy, it bears thinking about.
Now, as I consider how best to approach my research questions as I’m still struggling to formulate them, methodology is on my mind again. I’m interested in how Romantic Gothic literature engages with its economic contexts; for example, I want to know how Gothic texts use words like debt, luxury, counterfeit, and interest. I took a course called Out-of-the-Box Text Analysis at DHSI 2015, hoping that text analysis could help me. Unfortunately, it can’t, at least not in the way I was expecting. The tools we used are great at distinguishing differences between texts and sets of texts, but they do this based on word frequency. The words I’m interested in are not very frequent, but not exceptionally infrequent either, so although text analysis can help me identify words of interest and find where those words are used in my texts, it can’t help me answer my questions as I was hoping it would.
The problem I ran up against has to do with the assumption that the frequency with which words are used is important. This is true for the types of inquiries that these tools are generally used for, such as author attribution. For thematic studies like mine, though, the frequency of words is not a reliable measure of their significance. Take, for example, Radcliffe’s novel The Mysteries of Udolpho. Ask anyone what they remember most about the novel, and they’re likely to mention the veil: Emily lifts it and is horrified to see what she thinks is a corpse, but is really a wax model of a corpse (is that really any less creepy?). The idea of the veil permeates the novel, its mystery lingering on for Emily even after it’s resolved for the reader. The veil has a wider significance too, of course, as a symbol of Gothic dread—that feeling of wanting to know yet fearing to know—and as a symbol of ontological doubt, such as in Percy Shelley’s “Lift Not the Painted Veil.”
The image below is a word cloud showing the 75 most frequent words in Udolpho, excluding a standard list of stop-words such as pronouns, conjunctions, articles, and other common words (Sinclair). The word veil does not appear in this word cloud. In fact, it appears only 47 times in the text of 290 801 words, according to Voyant, the web-based text analysis tool I used to create this word cloud. Clearly, the frequency of the word in no way reflects the thematic or literary significance of the veil in the novel, a problem for those of us interested in using computation to explore thematic questions (at least those of us with novice-level text analysis skills).
The frequency of a word in a text can be quantified quite easily, by simply counting it. But how can we quantify this other kind of significance, the resonance of an idea that a word represents? In my research, I’m interested in words related to property, an idea at the core of Udolpho. As with the veil, the resonance of the idea of property is not captured in a measure of word frequency. Estate does not appear in the word cloud, for example (although, intriguingly, door does). As it happens, veil and estate both occur just 47 times in the novel. Coincidence?
Another example of this disparity between frequency and resonance occurs in Mary Shelley’s novella Matilda. In it, a single phrase—“My daughter, I love thee/you”—has enormous resonance; the story hinges upon it. The phrase appears only twice, though: first, as an expression of Matilda’s wish that she would be united with her estranged father (159), and second, as her father’s confession of his incestuous lust for her (173). Once again, a measure of frequency does not capture the resonance of these words. Complicating the case is that each utterance of the phrase has a radically different valence that is not captured by the words on the page. This difference must be imagined by the reader, and depends on how the words are said; the words are the same both times, but Matilda understands by the way her father says them that the love she wants from him is not the kind that he feels. This difference is not captured in the text, much less in an analysis of word frequency.
All this is to suggest that it’s important to think about the assumptions upon which research methodologies are based. Text analysis and other digital methods and tools offer ways to analyze texts that the read-think-write method does not. In my experience, thinking about different ways of answering research questions can lead to new types of questions about what it is we study when we analyze a text, and how we can capture elements of a text that are not easily quantified and measured.
Radcliffe, Ann. The Mysteries of Udolpho. n.p. n.d. Project Gutenberg. Web. 10 October 2015.
Shelley, Mary. Matilda. Ed. Janet Todd. 2011. Proquest. Web. 10 October 2015.
Sinclair, Stéfan and Geoffrey Rockwell. Voyant. 9 Oct. 2015 <http://voyant-tools.org>