Editing Modernism in Canada


Posts Tagged ‘text analysis’

June 9, 2012

DH Tools and Proletarian Texts

(This post originally appeared on the Proletarian Literature and Arts blog.)

I’m at the Digital Humanities Summer Institute at the University of Victoria this week. I’m taking a course on the Pre-Digital Book, which is already generating lots of interesting ideas about how we think and work with material texts, and how that is changing as we move into screen-based lives. There are, of course, many implications for how these differing textual modes relate to how we study and teach proletarian material, and more importantly, how class bears on these relationships. I hope to share some of these ideas as they have developed for me over the week. The course has taken up these questions in relation to medieval manuscripts and early modern incunablua and print, but the issues at stake are relevant for modern material as well. The instructors and librarians were kind enough to bring in a 1929 “novel in woodcuts” by Lynd Ward for me to look at – more on that will follow.

But, for fun, I also wanted to post about a little analysis experiment I did with some textual analysis tools.

I used the Voyant analysis tool to examine a set of Canadian manifesto writing. I transcribed six texts either from previous print publications or from archival scans for use as the corpus. These included: (1) “Manifesto of the Communist Parties of the British Empire”; (2) Tim Buck, “Indictment of Capitalism”; (3) CCF, “Regina Manifesto”; (4) Florence Custance, “Women and the New Age”; (5) “Our Credentials” from the first issue of Masses; and (6) Relief Camp Workers Strike Committee, “Official Statement”. [The RCWSC document remains my favorite text of all time.] Once applied, the tools let me read the texts in new ways, pulling out information or confirming ideas that I had about them in meaningful ways. You can find the summary of my corpus here.

The simplest visualization is the Cirrus word cloud, which at a glance shows that these texts are absolutely dominated by the language of class and politics (unsurprising, as they are aimed at remaking the existing class order). Michael Denning’s statement in The Cultural Front that the language of the 1930s became “labored” in both the public and metaphoric spheres is clearly reflected in this image.

Workers and capitalism, fighting it outCirrus visualization of Canadian manifesto texts

Looking at the differences among the materials, an analysis of distinctive words is a simple way to get at the position of a given text in relation to the others. We might think of these Canadian manifestos as occupying the same ground of debate (though they are not responding to one another directly), but not necessarily sharing the same tent. For example, the “Manifesto of the Communist Parties of the British Empire” shows a much higher concentration of the term “war”, which helps situate it to later in the 1930s. The “Regina Manifesto” is overwhelmingly concerned with the “public” as it plans for a collective society. Florence Custance’s feminist statement shows itself to be more unique in its own time, as it uses “women” and female pronouns far beyond the other texts. And the Masses text betrays its literary periodical background with its heavier use of “art”.

The density of vocabulary in the texts can tell us something about intended readerships, and purpose of the text. Masses plays with the linguistic conventions of the manifesto to develop a text that is both assertive and creative; accordingly, it uses the largest variety of words to do so. However, the RCWSC is not far behind in its forthright call to action, which tells me something interesting about the role of the imaginative mode in connecting revolution with creative acts. Buck’s “Indictment” is the least dense text. It’s also the longest, which makes for a highly repetitive text. The “Indictment” has a strong oral quality to it, commenting on Buck’s trial and defense and with response and Marxist analysis. It is also highly indebted to that style, parsing its terms minutely and using them for step-by-step explanations. It is in many ways the most didactic of the texts, as the word density suggests, though such analysis misses the purposeful element of the limited word choices. I find Buck’s repetition to have an incantatory quality connecting it more closely to spoken debate than the other texts, an impression that comes out of working with the text closely, while typing and re-typing, and reading it aloud for myself. Word density is not for me an assignation of value; rather, it is one of many ways of framing some thoughts on how these texts – and manifestos more broadly – employ particular rhetorical modes and how we can follow them through.

Here is the link to the Voyant analysis of my manifestos. I invite you to take a look, play around, and consider throwing up some text from other working-class and proletarian sources. It seems to me that a lot of textual analysis begins by reaching for “important” texts – those that are canonical, or historical. The tools make no distinction – I would like to see more examples of writing from below feeding into the ways we think about texts in the DH realm.

June 9, 2010

A Voyeur’s Peep] Tweet

To build on Stéfan Sinclair’s plenary talk at DHSI yesterday afternoon, I thought it appropriate to put Voyeur into action with some born-digital EMiC content. Perhaps one day someone will think to produce a critical edition of EMiC’s Twitter feed, but in the meantime, I’ve used a couple basic digital tools to show you how you can take ready-made text from online sources and plug it into a text-analysis and visualization tool such as Voyeur.

I started with a tool called Twapper Keeper, which is a Twitter #hashtag archive. When we were prototyping the EMiC community last summer and thinking about how to integrate Twitter into the new website, Anouk had the foresight to set up a Twapper Keeper hashtag archive (also, for some reason, called a notebook) for #emic. From the #emic hashtag notebook at the Twapper Keeper site, you can simply share the archive with people who follow you on Twitter or Facebook, or you can download it and plug the dataset into any number of text-analysis and visualization tools. (If you want to try this out yourself, you’ll need to set up a Twitter account, since the site will send you a tweet with a link to your downloaded hashtag archive.) Since Stéfan just demoed Voyeur at DHSI, I thought I’d use it to generate some EMiC-oriented text-analysis and visualization data. If you want to play with Voyeur on your own, I’ve saved the #emic Twitter feed corpus (which is a DH jargon for a dataset, or more simply, a collection of documents) that I uploaded to Voyeur. I limited the dates of the data I exported to the period from June 5th to early in the day on June 9th, so the corpus represents  the #emic feed during the first few days of DHSI. Here’s a screenshot of the tool displaying Twitter users who have included the #emic hashtag:

#emic hashtag Twitter feed, 5-9 June 2010

As a static image, it may be difficult to tell exactly what you’re looking at and what it means. Voyeur allows you to perform a fair number of manipulations (selecting keywords, using stop word lists) so that you can isolate the information about word frequencies within a single document (as in this instance) or a whole range of documents. As a simple data visualization, the graph displays the relative frequency of the occurrence of Twitter usernames of EMiCites who are attending DHSI and who have posted at least one tweet using the #emic hashtag. To isolate this information I created a favourites list of EMiC tweeters from the full list of words in the #emic Twitter feed. If you wanted to compare the relative frequency of the words “emic” and “xml” and “tei” and “bunnies,” you’d could either enter these words (separated by commas) into the search field in the Words in the Entire Corpus pane or manually select these words by scrolling through all 25 pages. (It’s up to you, but I know which option I’d choose.) Select these words and click the heart+ icon to add them to your favourites list. Then make sure you select them in the Words within the Document pane to generate a graph of their relative frequency. If want to see the surrounding context of the words you’ve chosen, you can expand the snippet view of each instance in the Keywords in Context pane.

Go give it a try. The tool’s utility is best assessed by actually playing around with it yourself.  If you’re still feeling uncertain about how to use the tool, you can watch Stéfan run through a short video demo.

While you’re at it, can you think of any ways in which we might implement a tool such as Voyeur for the purposes of text analysis of EMiC digital edtions? What kinds of text-analysis and visualization tools do you want to see integrated into EMiC editions? If you come across something you really find useful, please let me know (dean.irvine@dal.ca). Or, better, blog it!