June 11, 2010



A little DHSI playtime for you. First, two word clouds: one of the DHSI Twitter feed, the other of the EMiC Twitter feed. Both feeds were collected using the JiTR webscraper, a beta tool in development by Geoffrey Rockwell at the University of Alberta.

#emic Twitter Feed in JiTR

How did I do this?  First I scraped the text from the Twapper Keeper #dhsi2010 and #emic archives into JiTR. I did this because I wanted to clean it up a bit, take out some of the irrelevant header and footer text.  Because JiTR allows you to clean up the text (which is not an option in the Twapper Keeper export) you don’t have to work with messy bits that you don’t want to analyze. After that I saved my clean texts and generated what are called “reports.” The report feature creates a permanent URL that you can then paste into various TAPoRware tools.  I ran the reports of the #dhsi2010 and #emic feeds through two TAPoRware text-analysis tools, Voyeur and Word Cloud.

#emic Twitter Feed in TAPoR Word Cloud

#dhsi2010 Twitter Feed in TAPoR Word Cloud

If you want to generate these word clouds and interact with them, paste the report URLs I generated using JiTR into the TAPoR Word Cloud tool.

June 7, 2010

A New Build: EMiC Tools in the Digital Workshop

DEMiC +1

On the occasion of our 2010 DEMiC summer institute I’d like to present an interim report on EMiC’s major digital initiatives, our new institutional partnerships, and our four streams of collaborative digital-humanities research: (1) digitization, (2) image-based editing and markup, (3) text analysis, (4) and visualization.

Last June I trekked out to Victoria to attend the Digital Humanities Summer Institute with a group of graduate students, postdocs, and faculty affiliated with the EMiC project. There were a dozen of us; some came with skills and digital editing projects in the works, others were standing at the bottom of the learning curve staring straight up. Most enrolled in one of the two introductory courses in text encoding or digitization fundamentals. Meagan Timney, who is our first EMiC postdoctoral fellow, and I enrolled in Susan Brown and Stan Ruecker’s seminar on Digital Tools for Literary History. They introduced us to a whole range of text-analysis and visualization tools. I started to pick and choose tools that I thought might be useful for the EMiC kit. These tools have been principally intended for the analysis of text datasets, either plain vanilla transcriptions of the kind that one finds on Project Gutenberg or enriched transcriptions marked up in XML. The common denominator is obvious enough: these tools are designed to work with transcribed texts. But what if I wanted tools to work with texts rendered as digital images? What if I didn’t want to read transcribed texts but instead use tools that could read encoded digital images of remediated textual objects? What kind of tools are being developed for linking marked-up transcriptions to images? How can these tools be employed by scholarly editors?

