webscraper – Editing Modernism in Canada http://editingmodernism.ca Wed, 11 May 2016 16:54:25 +0000 en-US hourly 1 https://wordpress.org/?v=4.4.17 IMTweet http://editingmodernism.ca/2010/06/imtweet/ http://editingmodernism.ca/2010/06/imtweet/#comments Fri, 11 Jun 2010 15:52:20 +0000 http://lettuce.tapor.uvic.ca/~emic/?p=546

#emicClouds

A little DHSI playtime for you. First, two word clouds: one of the DHSI Twitter feed, the other of the EMiC Twitter feed. Both feeds were collected using the JiTR webscraper, a beta tool in development by Geoffrey Rockwell at the University of Alberta.

#emic Twitter Feed in JiTR

How did I do this?  First I scraped the text from the Twapper Keeper #dhsi2010 and #emic archives into JiTR. I did this because I wanted to clean it up a bit, take out some of the irrelevant header and footer text.  Because JiTR allows you to clean up the text (which is not an option in the Twapper Keeper export) you don’t have to work with messy bits that you don’t want to analyze. After that I saved my clean texts and generated what are called “reports.” The report feature creates a permanent URL that you can then paste into various TAPoRware tools.  I ran the reports of the #dhsi2010 and #emic feeds through two TAPoRware text-analysis tools, Voyeur and Word Cloud.

#emic Twitter Feed in TAPoR Word Cloud

#dhsi2010 Twitter Feed in TAPoR Word Cloud

If you want to generate these word clouds and interact with them, paste the report URLs I generated using JiTR into the TAPoR Word Cloud tool.

[#emic]

http://ra.tapor.ualberta.ca/~jitr/contents/show_report/4848?key=921556750373594176491

[#dhsi2010]

http://ra.tapor.ualberta.ca/~jitr/contents/show_report/4847?key=78555118459803099008

If you want to try JiTR and do some webscraping and aggregating on your own, let me know and I’ll put you in touch with Geoffrey Rockwell. You’ll need a username and password to test it out.

Gender Peeps

One of the many things that caught my attention on the #emic and #dhsi2010 Twitter feeds was the sudden emergence mid-week of a stream of discourse surrounding gender and the digital humanities. I thought that it might be revealing to compare ways in which the #emic and #dhsi2010 feeds differ. It turns out that just one tweet in the EMiC stream mentioned Susan Brown, and none of us picked up on the gender and digital humanities discussion that cropped up in the #dhsi stream. Obviously what this tells me is that Voyeur is only really useful if the documents you’re comparing contain the same keywords. I don’t think it really tells us much about the disposition of EMiC tweeters toward questions of gender. You can check out the results for yourself. Just paste the report URLs above into Voyeur. You’ll need to generate a favourites lists of keywords (gender, women, female, feminism, etc).

Blogospherics

So, after that lackluster result, I thought that I’d turn my webscraper to the EMiC blog. I uploaded the individual URLs for each post to Voyeur. You can play around with some keywords on your own. Here’s the URL to our blog corpus http://voyeur.hermeneuti.ca/?corpus=1276258984042.7345. For the screenshot below I picked IMT, digital, and editing, since they represent keywords widely used across a significant number of blog posts. What the stats and visualization confirm is what we all probably already know based on your anecdotal reports from DHSI reflected in Twitter and blog activity: we’re all keenly interested in the development of IMT. I’ve been particularly impressed by the initiative of DEMiC participants to work directly with Meagan on IMT. What we need to do from here is to sustain that dialogue through the blog over the summer and take the opportunity again this fall at the EMiC conference to reprise our conversations about IMT and assess what we’ve done so far in helping to implement the features, standards, and protocols that we’ve been discussing so far. Looking ahead to DHSI 2010, we’ll be in a position to ramp up our work with IMT in a specialized image-markup and edition-production seminar.  I’d be very interested to see even more blog posts about curricular desiderata for that course.

EMiC Blog in Voyeur

Back to blog analytics. What I did next was scrape each individual blog post into JiTR using a feature called a text aggregator, which allowed me to list all of the URLs for each blog posting and scrape everything at once.

JiTR Text Aggregator

Then I uploaded the blog report URL (http://ra.tapor.ualberta.ca/~jitr/contents/show_report/5038?key=474528093615860579662) into Voyeur. As a comparator document, I also uploaded an updated scraping of the #emic Twitter feed. Let’s see what it tells us about differences between our blogging and tweeting about IMT.

IMT on EMiC Blog and Twitter Feed

There’s not all that much data to work with here, but what we might draw from this analysis is that IMT spiked early on in the Twitter feed (the green line) and was picked up on the blog (the blue line). We could extrapolate from this that our practice is a fairly common representation of the interactive relationship between tweeting and blogging: the conversation begins with probing tweets followed by more expansive commentary on blogs. That, in any case, was one of the reasons why Meg and I wanted EMiCites to tweet during DHSI: it’s not the Twitter feed that generates substantive content, but it does initiate a social exchange of ideas that finds its way to more formalized forums such as blogs, and as Emily has already suggested, journals. These are conversations that will also insinuate themselves into our roundtables and panels at the Conference on Editorial Problems at the University of Toronto in October, and they will be reprised during our the EMiC roundtable session on Editorial Networks and Modernist Remediations at the Modernist Studies Association conference in Victoria in November. These conversations will in turn end up in the edited essay collections and special journal issues that we have planned. And, ultimately, our blogs and tweets from DHSI will inform the design and functionality of our digital editions, archives, and commons and their interoperability with toolkits such as those in development by Susan Brown’s CWRC project. It’s a discursive exchange that links print and digital media, a dialogic interaction that extends from tweets to editions, blogs to essays, digital toolkits to roundtables.

]]>
http://editingmodernism.ca/2010/06/imtweet/feed/ 1