I have only dipped my toes into the waters of digital humanism. But now that I’ve spent some time thinking about how to store, organized, and share digital information – and especially after a week of learning TEI at TEMiC – I want to learn how to swim. What follows is my roundup of the EMiC-funded RA project I recently finished. I hope that by offering some of the details about my project, we can start a discussion about tools, processes, and protocols for similar projects.
In January 2010, I began working as an RA for Dean Irvine, gathering information towards the annotations for edition of FR Scott’s poetry that he is co-editing with Robert G. May (Auto-Anthology: Complete Poems and Translations, 1918-84). I am a PhD candidate at McGill, and I was hired by Dean Irvine to examine holdings in the FR Scott collection, which is housed in the Rare Books Room here. Dean wanted me to record annotations in Scott’s books of English Language poetry; this was intended to help the editors make decisions about what to annotate in the edition. I was to be his eyes for this part of the project, sifting through hundreds of books to find what was useful, thus allowing Dean to get the information he needed while staying home in Halifax. It works out that being someone else’s eyes is a serious challenge, because the two sets of eyes happen to be attached to two different brains. Before I elaborate on this challenge, though, I’d like to explain what this recording of annotations involved.
There are two separate finding aids for the Scott collection. I only found out about the second because while working in the Rare Books Room one afternoon, I happened to ask to see a hardcopy of the finding aid I’d been working with (I had a scan, but it was not entirely legible). Scott’s books were donated to McGill in two groups: the contents of his Law Faculty office, which were given to the library shortly after his death; and his personal library from his home, which his widow donated in 1988. The first finding aid – the one I had from the beginning, which seems to correspond to Scott’s personal library — was drawn up in 1990, and there is a manuscript note on its title page, indicating that the list needs revision but is complete. For reasons that remain obscure, a second finding aid was drawn up in 1994; it, too, appears to contain books from the second donation, and makes no reference to a third donation (which would easily explain the new finding aid, but which I have no reason to believe occurred). There does not appear to be a finding aid for the Law office books; they are now sitting in boxes, uncatalogued, in the Rare Books room, because the Law Library decided they weren’t interested in them.
I have offered this detailed account of the collection’s provenance because it is the clearest way to explain the organizational challenges I was facing. Both finding aids contained material that I was being asked to examine, but they were organized differently: the first was grouped by genre and language, and organized alphabetically within those groups, making it relatively easy to pick out the English-language poetry. The second was strictly alphabetical, so that unless I was familiar with the author or title, it was difficult to know whether a work was poetry or prose. To complicate the matter further, Dean had requested that poetry from before 1880 be excluded from my search – this narrowed the field, but made it a bit difficult to spot unfamiliar early works which Scott owned in modern editions. Before I could even begin looking at the books, this data had to be sorted and organized. No mean feat, particularly when I began with documents that had been scanned – OCR, as we’re all aware, is imperfect enough to cause serious aggravation.
And this is where I learned that I have a lot to learn about being a digital humanist. I recorded my findings in Word, because that is what I am used to using to do my work. Not only was it difficult to turn the scanned document into a pretty Word document – it was also tough to organize the entries in Word. In hindsight, it would have been much better to work in Excel. Doing so would have prompted me to think harder about how the data would be used and could be sorted: in retrospect, I envision columns indicating whether there are annotations, ephemera, Scott’s signature, inscriptions from others, and so on, in addition to the kind of discursive analysis that I provided. This raises the important matter of considering, from the get-go, the best tools for the job and the best ways to use those tools.
A related problem, as I mentioned above, is the difficulty of being another person’s eyes. Although I knew the purpose of the information I was gathering, it was difficult for me to know what was important information, so I took note of almost everything. Again, in retrospect, I’m pretty sure that taking note of the inscriptions to Scott, particularly in books that were otherwise un-annotated, was not an effective use of my time. Ultimately, I spent 10 hours this summer creating an Excel sheet that contained almost everything but the inscriptions. Such retrospective claims are intended as a reminder to me—and I hope to you—of the importance of thinking through what information we really want to gather, and how it should be organized. (I do, however, have a complete record of who wrote what in Scott’s books of poetry… you know, just in case.) Naturally, there are also things I might have taken note of that I didn’t: I did not regularly take note of the condition of the book, but only noted particularly interesting cases. I can imagine that know whether a book looked read or not might, in the end, be helpful in deciding whether a perceived allusion to something in that book was real.
A final point about data sharing: though I was sometimes able to provide a discursive account of an annotation, often it was necessary to have the pertinent page (or the ephemera) scanned. McGill’s Rare Books room does not have self-service scanning; each scan costs 25 cents, there is a limit of 10 scans per book, and a detailed scan form had to be completed to order the scan. Each image is returned as its own unique jpg with an incomprehensible numeric file name. So, when I had multiple scans from the same book, I needed to convert them to pdfs, combine them, and re-name the file so that it could be clearly associated with the book which it represented. Then, I had to upload those files to google docs, so they could be shared with Dean. All of this made for a really cumbersome process – one which I hope it might be possible to streamline should a similar case arise in the future. Perhaps it’s even possible to do this better now, but I’m just not aware of the tools to do it.
So… that’s what I’ve been up to. I hope that this narrative will be useful in helping other members of this community think about how we gather, organize, store, and share information. If there are tools out there that would have helped with this work, I’d love to hear about them… and I’d really like to hear your ideas about how we can plan a project so that we are using the available tools to their best advantage from the get-go. And in the meantime, I’m looking forward to signing up for more swimming lessons.
On June 9, 2010, Wired.com ran a story announcing the intention of DARPA, the experimental research arm of the United States Department of Defense, to create “mission planning software” based on the popular tax-filing software, Turbotax.
What fascinated the DoD was that Turbotax “encoded” a high level of knowledge expertise into its software allowing people with “limited knowledge of [the] tax code” to negotiate successfully the complex tax-filing process that “would otherwise require an expert-level” of training (Shachtman). DARPA wanted to bring the power of complex “mission planning” to the average solider who might not have enough time/expertise to make the best decision possible for the mission.
I start with this example to show that arcane realms of expertise, such as the U.S. Tax Code, can be made accessible to the general public through sound interface design and careful planning. This is especially pertinent to Digital Humanities scholars who do not always have the computer-science training of other disciplines but still rely on databases, repositories, and other computer-mediated environments to do their work. This usually means that humanities scholars spend hours having awkward phone conversations with technical support or avoid computer-mediated environments altogether.
With the arrival of new fields like Periodical Studies, however, humanities scholars must rely on databases and repositories for taxonomy and study. As Robert Scholes and Clifford Wulfman note in Modernism in the Magazines, the field of periodical studies is so vast that editors of print editions have had to make difficult choices in the past as to what information to convey since it would be prohibitively expensive to document all information about a given periodical (especially since periodicals tended to change dramatically over the course of their runs). Online environments have no such limitations and thus provide an ideal way of collecting and presenting large amounts of information. Indeed, Scholes and Wulfman call for “a comprehensive set of data on magazines that can be searched in various ways and organized so as to allow grouping by common features, or sets of common features” (54).
What DARPA and Turbotax realize is that computer-mediated environments can force submission compliance with existing “best practices” in order to capitalize on the uneven expertise levels of the general population. Wulfman and Scholes call for the creation of a modernist periodical database where modernist scholars can work together and map the field of periodical studies according to agreed upon standards of scholarship. By designing a repository on a Turbotax model of submission compliance, the dream of community-generated periodical database that conforms to shared bibliographic standards is readily attainable.
Because of the vastness of its subject matter, Periodical Studies is inherently a collaborative discipline—no one scholar has the capacity to know everything about every periodical (or everything about one magazine for that matter). Thus, the creation of periodical database is necessary to map the field and gather hard data about modernist periodical production. The problem is that not every periodical scholar has the computer expertise to create or even navigate the complexities of database/repository systems. Nor does every scholar know how to follow the best metadata and preservation practices of archival libraries. We are now at a point where we can utilize the interests and expertise of humanities by creating a repository that forces proper “input” along the lines of Turbotax.
Challenge
I use the example of periodical studies to challenge the greater field of Digital Humanities. Our discipline has now reached a mature age, and think we can all agree that the battle between “Humanities Computing” and “Digital Humanities” should be put to rest as we move into the next phase of the field: designing user-friendly interfaces based on a Turbotax model of user input. For example, even at this stage of Digital Humanities, there doesn’t appear to be a web-based TEI editor that can link with open repositories like Fedora Commons. In fact, the best (and most stable) markup tool I’ve used thus far is Martin Holmes’s Image Markup Tool at the University of Victoria. Even this useful bit of software is tied to the Windows OS, and it operates independently of repository systems. That means a certain level of expertise is needed to export the IMT files to a project’s repository system. That is, the process of marking up the text is not intuitive for a project that wishes to harness the power of the many in marking up texts (by far, the most time-consuming process of creating a digital edition). Why not create a Digital-Humanities environment that once installed on a server, walks a user through the editing process, much like Turbotax walks a user through his/her taxes? I used to work as an editor for the James Joyce Quarterly. I experienced many things there, but the most important thing I learned is that there is a large community of people (slightly insane), who are willing to dedicate hours of their time dissecting and analyzing Joyce. Imagine what a user-generated Ulysses would look like with all of that input! (we would, of course, have to ban Stephen Joyce from using it–or at least not tell him about it).
Digital Humanities Ecosystems
The story of Digital Humanities is one littered with failed or incomplete tools. I suspect, save for the few stalwarts working under labs like Martin Holmes, or our colleagues in Virginia and Georgia, and elsewhere, that tools are dependent on stubborn coders with enough time to do their work. I find this to be a very inefficient way of designing tools and a system too dependent on personalities. I know of a handful of projects right now attempting to design a web-based TEI editor, but I’m not holding my breath for any one of them to be finished soon (goals change, after all). Instead of thinking of Digital Humanities development in these piecemeal terms, I think we need to come together as a federation to design ECOSYTEMS of DH work–much like Turbotax walks one through the entire process of filing taxes.
I think the closest thing we have to this right now is OMEKA, which through its user-base grows daily. What if we took Omeka’s ease-of-use for publishing material online and made into a full ingestion and publication engine? We don’t need to reinvent the wheel after all: Librarians have already shown us how we should store our material according to Open Archival Standards. There is even an open repository system in Fedora Commons. We even know what type of markup we should be using: TEI and maybe RDF. And Omeka has shown us how beautiful publication can be on the web.
Now, Digital Humanists, it is our time to take this knowledge and create archives/databases based on the Turbotax model of doing DH work: We need to create living ecosystems where each step of digitizing a work is clearly provided by a front end to the repository. Discovery Garden is working on such an ecosystem right now with the Islandora framework (a Fedora Commons backend with a Drupal front end), and I hope it will truly provide the first “easy-to-use” system that once installed on a server will allow all members of a humanist community to partake in digital humanities work. If I’m training students to encode TEI, why can’t I do so online actually encoding TEI for NINES or other projects? I’ve been in this business for years now, and even I get twitchy running shell scripts—my colleagues and students are even more nervous. So let’s build something for them, so we they can participate in the digital humanities as well. Everyone has something to gain.
I am attempting to harness the power of the crowd with “the Database of Modernist Periodicals,” to be announced this summer. I’ll let you know how it goes.
I end with this caveat: We need to prepare for the day when the “digital” humanities will simply be “the humanities,” and that means democratizing the digital (especially in our tools). Even I was able to file my taxes this year.
I thought those of us who had been to DHSI and who were fortunate enough to take the TEI course with Julia Flanders and Syd Bauman might be interested in a recent interview with Julia, in which she puts the TEI Guidelines and the digital humanities into the wider context of scholarship, pedagogy and the direction of the humanities more generally. (I also thought others might be reassured, as I was, to see someone who is now one of the foremost authorities on TEI describing herself as being baffled by the technology when she first began as a graduate student with the Women Writers Project …)
Here are a few excerpts to give you a sense of the piece:
[on how her interest in DH developed] I think that the fundamental question I had in my mind had to do with how we can understand the relationship between the surfaces of things – how they make meaning and how they operate culturally, how cultural artefacts speak to us. And the sort of deeper questions about materiality and this artefactual nature of things: the structure of the aesthetic, the politics of the aesthetic; all of that had interested me for a while, and I didn’t immediately see the connections. But once I started working with what was then what would still be called humanities computing and with text encoding, I could suddenly see these longer-standing interests being revitalized or reformulated or something like that in a way that showed me that I hadn’t really made a departure. I was just taking up a new set of questions, a new set of ways of asking the same kinds of questions I’d been interested in all along.
I sometimes encounter a sense of resistance or suspicion when explaining the digital elements of my research, and this is such a good response to it: to point out that DH methodologies don’t erase considerations of materiality but rather can foreground them by offering new and provocative optics, and thereby force us to think about them, and how to represent them, with a set of tools and a vocabulary that we haven’t had to use before. Bart’s thoughts on versioning and hierarchies are one example of this; Vanessa’s on Project[ive] Verse are another.
[discussing how one might define DH] the digital humanities represents a kind of critical method. It’s an application of critical analysis to a set of digital methods. In other words, it’s not simply the deployment of technology in the study of humanities, but it’s an expressed interest in how the relationship between the surface and the method or the surface and the various technological underpinnings and back stories — how that relationship can be probed and understood and critiqued. And I think that that is the hallmark of the best work in digital humanities, that it carries with it a kind of self-reflective interest in what is happening both at a technological level – and it’s what is the effect of these digital methods on our practice – and also at a discursive level. In other words, what is happening to the rhetoric of scholarship as a result of these changes in the way we think of media and the ways that we express ourselves and the ways that we share and consume and store and interpret digital artefacts.
Again, I’m struck by the lucidity of this, perhaps because I’ve found myself having to do a fair bit of explaining of DH in recent weeks to people who, while they seem open to the idea of using technology to help push forward the frontiers of knowledge in the humanities, have had little, if any, exposure to the kind of methodological bewilderment that its use can entail. So the fact that a TEI digital edition, rather than being some kind of whizzy way to make bits of text pop up on the screen, is itself an embodiment of a kind of editorial transparency, is a very nice illustration.
[on the role of TEI within DH] the TEI also serves a more critical purpose which is to state and demonstrate the importance of methodological transparency in the creation of digital objects. So, what the TEI, not uniquely, but by its nature brings to digital humanities is the commitment to thinking through one’s digital methods and demonstrating them as methods, making them accessible to other people, exposing them to critique and to inquiry and to emulation. So, not hiding them inside of a black box but rather saying: look this, this encoding that I have done is an integral part of my representation of the text. And I think that the — I said that the TEI isn’t the only place to do that, but it models it interestingly, and it provides for it at a number of levels that I think are too detailed to go into here but are really worth studying and emulating.
I’d like to think that this is a good description of what we’re doing with the EMiC editions: exposing the texts, and our editorial treatement of them, to critique and to inquiry. In the case of my own project involving correspondence, this involves using the texts to look at the construction of the ideas of modernism and modernity. I also think the discussions we’ve begun to have as a group about how our editions might, and should, talk to each other (eg. by trying to agree on the meaning of particular tags, or by standardising the information that goes into our personographies) is part of the process of taking our own personal critical approaches out of the black box, and holding them up to the scrutiny of others.
The entire interview – in plain text, podcast and, of course, TEI format – can be found on the TEI website here.
Does anybody know if there has been much theorization of digital anti-humanism?
With my now mandatory cup of java in hand to fight the jet lag, I take my charmingly wobbly seat in the Henry Hickman Building eager to hear another great round of Graduate presentations. I was particularly enthusiastic this morning, as two of the presentations included the terms “Visual Representation” and “Curating” in their titles respectively – could these be indexes to the topics of visual arts or the handling of images in digital humanities (something I haven’t heard much [enough] about yet this week and which is central to my interests)?
No dice. Well… this isn’t entirely true. At least the paper with the promising “Visual Representation” in its title was about visual art—the Graffiti Research Lab to be specific. I won’t go into much discussion about that presentation here (although I would be pleased to hear what others thought of it), because I want to focus on the other presentation that had a sexy, but somewhat misleading, title: “Curating as Research: Digital Humanities and the Study of Culture in Real-Time.” The presentation had little to do with visual art, and little to do with art or archival curation. The premise: social media as curation.
The metaphor of social media as curation, although intriguing, sat uneasy in my mind for a number of reasons. I know the expression “digital curation” is a widely used phrase; but today the term “curator” was presented as a signifier for the ideas of “aggregating” and “presenting”—central activities of blogging, twittering, facebooking, etc., and which are supposedly related to art or archival curation. I started to feel sympathetic for any possible curators in the room, whose professional expertise was (in my opinion) greatly scaled down to two tasks. Ultimately, my discomfort with the metaphor really got me to thinking about metaphor. Why have a number of digital humanists this week felt compelled to “metaphorize” their roles, tasks, projects and what are the implications of analogizing the profession?
The question brings to mind Zailig and Emily’s paper on the Digital Page and “Respect des fonds,” which some of us read at TEMiC the other week. If I recall correctly, the brilliant authors caution against the metaphor of the “digital archive,” which “conceals the fact that rather than being a new and improved version for the postmodern age…[it] is conceptually no different than the pre-modern archive-as-collection. . . .” Digital collections are useful; but they are not the same as a fonds or archives. In fact, it may be the differences between these two resources that merit consideration and emphasis, and not the similarities.
My biggest concern is that by analogizing the role of digital humanists, we may be delimiting a fictive and unsuitable space for ourselves in the realm of scholarship and research. What I mean to ask, in this coffee-induced unsophisticated way, is: when we graph the activities and contributions of digital scholarship and research onto other well-known models of scholarship and research, do we risk imposing limitations on the field and under-acknowledging the value of what we do, and what others do? Or am I being overly critical? Besides the obvious communicative function, what value is there in speaking in metaphors?
Emily, Chris et. al. were gracious enough to welcome us into their home-away-from-home for a little get-together this evening at DHSI. Fun was had – there’s photographic evidence:
The really engaged posts on the EMiC blog have really got me thinking… If we put all this effort into developing the “EMiC” brand of digital mark-up… Would it be possible to create an online graduate journal, or something similar, within which we could publish samples of our projects? Hosted through EMiC, or partnered with EMiC, but a distinct entity?
This way, like Chris suggests, we could develop a “house style” which all our projects could conform to, but also use and develop. It could be a collaboration, by both humanities scholars, digital humanists and other computer scientists. If we worked with them to develop the tools we’d need to start out and get the ball rolling, then we would be able to self teach through forums like the DHSI summer course.
Some of the small projects we might be interested in pursuing don’t necessarily have a forum for publication. This would give graduate students a chance to learn new technology, and then have an immediate application for it. They would know that their work had a possible “home” within the journal.
Obviously this is looking a little bit longer term, but it would be really amazing if we were able to lay the groundwork for this in the next few years, while we have an EMiC to support and engage us.
Perhaps it could be split into two parts, half scholarly articles about editing in print and online, and half documents or editorial projects that are entirely born digital.
I realize that I am perhaps being a bit over ambitious… But I couldn’t help but take it to the next level. Thoughts??
Did I take it too far?
Did I?
In the unavoidable absence of Dean, Megan and I did a report on EMiC. The aim of Scaling the Humanities is to explore ways in which the Digitial Humanities can “scale up” to avoid unnecessary duplication, to encourage co-operartive development, and to produce the strongest possible face to granting bodies. The first couple days have consisted of a series of reports from a wide range of projects. Tomorrow we will be pooling our various experiences.
Meagan, who is more knowlegeable about EMiC than I am — or than anyone is except for Dean — did a quick survey of the origins of EMiC and I carried on with a brief version of Dean’s update on recent developments, focussing on:
1. image-based editing and markup
2. digitization
3. text analysis
4. visualization
Response to the report was very positive. EMiC is clearly seen as a model digital humanities project, especially in its flexibility and openness to development in previously unforeseen directions. I will report on the results of tomorrow’s discussions and their relevance to EMiC.
Here it is, lunchtime on day one of the DHSI. As I happily munch on lunch with my fellow roommates, I feel a tinge of jealousy that I can’t retake the TEI Fundamentals course. This year, we are lucky enough to have 14 of the 19 EMiC participants enrolled in that class. Having that large a group to commiserate with is very helpful at the early stages of learning a new language. As P.K. Page struggles and goes silent because of the overwhelming nature of learning Portuguese in Brazil, so I too struggled with learning a language of angle brackets and abbreviations that has been a bit suppressed since my last visit to Victoria.
Returning now with a fresh face, I feel re-engaged with the digital tools. My new course, Transcribing Primary Sources, is much more invested in the bibliographic and social text features of the text. Matt just spent half an hour talking about all the ways you can describe the scribes who wrote the text and how to mark specific regional geography to “map” the transmission of the text. How awesome is that?
Because lunch is fast wrapping up, the last piece of news I want to share is about our afternoon project. I am doing digital mark-up fill in the blank! This guy definitely understands my abilities. I get to go hunting for the right information, but I also have the safety blanket of knowing that in this case there is a “right answer” which I can try to find.
Back to work, and I can’t wait to talk (and read!) about your experiences at DEMiC today!
Tonight we had a dinner gathering for EMiC participants at the Digital Humanities Summer Institute in Victoria, BC. Zailig Pollock did a great impression of our fearless director, Dean, and discussed some of the digital humanities initiatives with which EMiC is involved. After Zailig’s presentation, I showed off the new (in-progress) website.
I encourage our EMiC contingent to blog and tweet while at the DHSI. I’ve now created user accounts for everyone, and you should receive an email with your login details. If you haven’t received an email, please let me know by posting in the comments, or sending me a message (mbtimney.etcl@gmail.com). I also encourage everyone to use twitter in conjunction with our hash tag, #emic (and #dhsi2010).
It was wonderful to see old friends, and to meet new ones, too. We’ve got a great group of folks who comprise a strong EMiC team. I am really looking forward to this week at the DHSI!