IMTweet

Dean Irvine — Fri, 11 Jun 2010 15:52:20 +0000

#emicClouds

A little DHSI playtime for you. First, two word clouds: one of the DHSI Twitter feed, the other of the EMiC Twitter feed. Both feeds were collected using the JiTR webscraper, a beta tool in development by Geoffrey Rockwell at the University of Alberta.

#emic Twitter Feed in JiTR

How did I do this? First I scraped the text from the Twapper Keeper #dhsi2010 and #emic archives into JiTR. I did this because I wanted to clean it up a bit, take out some of the irrelevant header and footer text. Because JiTR allows you to clean up the text (which is not an option in the Twapper Keeper export) you don’t have to work with messy bits that you don’t want to analyze. After that I saved my clean texts and generated what are called “reports.” The report feature creates a permanent URL that you can then paste into various TAPoRware tools. I ran the reports of the #dhsi2010 and #emic feeds through two TAPoRware text-analysis tools, Voyeur and Word Cloud.

#emic Twitter Feed in TAPoR Word Cloud

#dhsi2010 Twitter Feed in TAPoR Word Cloud

If you want to generate these word clouds and interact with them, paste the report URLs I generated using JiTR into the TAPoR Word Cloud tool.

[#emic]

http://ra.tapor.ualberta.ca/~jitr/contents/show_report/4848?key=921556750373594176491

[#dhsi2010]

http://ra.tapor.ualberta.ca/~jitr/contents/show_report/4847?key=78555118459803099008

If you want to try JiTR and do some webscraping and aggregating on your own, let me know and I’ll put you in touch with Geoffrey Rockwell. You’ll need a username and password to test it out.

Gender Peeps

One of the many things that caught my attention on the #emic and #dhsi2010 Twitter feeds was the sudden emergence mid-week of a stream of discourse surrounding gender and the digital humanities. I thought that it might be revealing to compare ways in which the #emic and #dhsi2010 feeds differ. It turns out that just one tweet in the EMiC stream mentioned Susan Brown, and none of us picked up on the gender and digital humanities discussion that cropped up in the #dhsi stream. Obviously what this tells me is that Voyeur is only really useful if the documents you’re comparing contain the same keywords. I don’t think it really tells us much about the disposition of EMiC tweeters toward questions of gender. You can check out the results for yourself. Just paste the report URLs above into Voyeur. You’ll need to generate a favourites lists of keywords (gender, women, female, feminism, etc).

Blogospherics

So, after that lackluster result, I thought that I’d turn my webscraper to the EMiC blog. I uploaded the individual URLs for each post to Voyeur. You can play around with some keywords on your own. Here’s the URL to our blog corpus http://voyeur.hermeneuti.ca/?corpus=1276258984042.7345. For the screenshot below I picked IMT, digital, and editing, since they represent keywords widely used across a significant number of blog posts. What the stats and visualization confirm is what we all probably already know based on your anecdotal reports from DHSI reflected in Twitter and blog activity: we’re all keenly interested in the development of IMT. I’ve been particularly impressed by the initiative of DEMiC participants to work directly with Meagan on IMT. What we need to do from here is to sustain that dialogue through the blog over the summer and take the opportunity again this fall at the EMiC conference to reprise our conversations about IMT and assess what we’ve done so far in helping to implement the features, standards, and protocols that we’ve been discussing so far. Looking ahead to DHSI 2010, we’ll be in a position to ramp up our work with IMT in a specialized image-markup and edition-production seminar. I’d be very interested to see even more blog posts about curricular desiderata for that course.

EMiC Blog in Voyeur

Back to blog analytics. What I did next was scrape each individual blog post into JiTR using a feature called a text aggregator, which allowed me to list all of the URLs for each blog posting and scrape everything at once.

JiTR Text Aggregator

Then I uploaded the blog report URL (http://ra.tapor.ualberta.ca/~jitr/contents/show_report/5038?key=474528093615860579662) into Voyeur. As a comparator document, I also uploaded an updated scraping of the #emic Twitter feed. Let’s see what it tells us about differences between our blogging and tweeting about IMT.

IMT on EMiC Blog and Twitter Feed

There’s not all that much data to work with here, but what we might draw from this analysis is that IMT spiked early on in the Twitter feed (the green line) and was picked up on the blog (the blue line). We could extrapolate from this that our practice is a fairly common representation of the interactive relationship between tweeting and blogging: the conversation begins with probing tweets followed by more expansive commentary on blogs. That, in any case, was one of the reasons why Meg and I wanted EMiCites to tweet during DHSI: it’s not the Twitter feed that generates substantive content, but it does initiate a social exchange of ideas that finds its way to more formalized forums such as blogs, and as Emily has already suggested, journals. These are conversations that will also insinuate themselves into our roundtables and panels at the Conference on Editorial Problems at the University of Toronto in October, and they will be reprised during our the EMiC roundtable session on Editorial Networks and Modernist Remediations at the Modernist Studies Association conference in Victoria in November. These conversations will in turn end up in the edited essay collections and special journal issues that we have planned. And, ultimately, our blogs and tweets from DHSI will inform the design and functionality of our digital editions, archives, and commons and their interoperability with toolkits such as those in development by Susan Brown’s CWRC project. It’s a discursive exchange that links print and digital media, a dialogic interaction that extends from tweets to editions, blogs to essays, digital toolkits to roundtables.

A Voyeur’s Peep] Tweet

Dean Irvine — Wed, 09 Jun 2010 19:19:06 +0000

To build on Stéfan Sinclair’s plenary talk at DHSI yesterday afternoon, I thought it appropriate to put Voyeur into action with some born-digital EMiC content. Perhaps one day someone will think to produce a critical edition of EMiC’s Twitter feed, but in the meantime, I’ve used a couple basic digital tools to show you how you can take ready-made text from online sources and plug it into a text-analysis and visualization tool such as Voyeur.

I started with a tool called Twapper Keeper, which is a Twitter #hashtag archive. When we were prototyping the EMiC community last summer and thinking about how to integrate Twitter into the new website, Anouk had the foresight to set up a Twapper Keeper hashtag archive (also, for some reason, called a notebook) for #emic. From the #emic hashtag notebook at the Twapper Keeper site, you can simply share the archive with people who follow you on Twitter or Facebook, or you can download it and plug the dataset into any number of text-analysis and visualization tools. (If you want to try this out yourself, you’ll need to set up a Twitter account, since the site will send you a tweet with a link to your downloaded hashtag archive.) Since Stéfan just demoed Voyeur at DHSI, I thought I’d use it to generate some EMiC-oriented text-analysis and visualization data. If you want to play with Voyeur on your own, I’ve saved the #emic Twitter feed corpus (which is a DH jargon for a dataset, or more simply, a collection of documents) that I uploaded to Voyeur. I limited the dates of the data I exported to the period from June 5th to early in the day on June 9th, so the corpus represents the #emic feed during the first few days of DHSI. Here’s a screenshot of the tool displaying Twitter users who have included the #emic hashtag:

#emic hashtag Twitter feed, 5-9 June 2010

As a static image, it may be difficult to tell exactly what you’re looking at and what it means. Voyeur allows you to perform a fair number of manipulations (selecting keywords, using stop word lists) so that you can isolate the information about word frequencies within a single document (as in this instance) or a whole range of documents. As a simple data visualization, the graph displays the relative frequency of the occurrence of Twitter usernames of EMiCites who are attending DHSI and who have posted at least one tweet using the #emic hashtag. To isolate this information I created a favourites list of EMiC tweeters from the full list of words in the #emic Twitter feed. If you wanted to compare the relative frequency of the words “emic” and “xml” and “tei” and “bunnies,” you’d could either enter these words (separated by commas) into the search field in the Words in the Entire Corpus pane or manually select these words by scrolling through all 25 pages. (It’s up to you, but I know which option I’d choose.) Select these words and click the heart+ icon to add them to your favourites list. Then make sure you select them in the Words within the Document pane to generate a graph of their relative frequency. If want to see the surrounding context of the words you’ve chosen, you can expand the snippet view of each instance in the Keywords in Context pane.

Go give it a try. The tool’s utility is best assessed by actually playing around with it yourself. If you’re still feeling uncertain about how to use the tool, you can watch Stéfan run through a short video demo.

While you’re at it, can you think of any ways in which we might implement a tool such as Voyeur for the purposes of text analysis of EMiC digital edtions? What kinds of text-analysis and visualization tools do you want to see integrated into EMiC editions? If you come across something you really find useful, please let me know (dean.irvine@dal.ca). Or, better, blog it!

Voyeur – Editing Modernism in Canada

IMTweet

A Voyeur’s Peep] Tweet