September 25, 2011

Tales from the Rare Books Room – on recording, organizing, and sharing

I have only dipped my toes into the waters of digital humanism. But now that I’ve spent some time thinking about how to store, organized, and share digital information – and especially after a week of learning TEI at TEMiC – I want to learn how to swim. What follows is my roundup of the EMiC-funded RA project I recently finished. I hope that by offering some of the details about my project, we can start a discussion about tools, processes, and protocols for similar projects.

In January 2010, I began working as an RA for Dean Irvine, gathering information towards the annotations for edition of FR Scott’s poetry that he is co-editing with Robert G. May (Auto-Anthology: Complete Poems and Translations, 1918-84). I am a PhD candidate at McGill, and I was hired by Dean Irvine to examine holdings in the FR Scott collection, which is housed in the Rare Books Room here. Dean wanted me to record annotations in Scott’s books of English Language poetry; this was intended to help the editors make decisions about what to annotate in the edition. I was to be his eyes for this part of the project, sifting through hundreds of books to find what was useful, thus allowing Dean to get the information he needed while staying home in Halifax. It works out that being someone else’s eyes is a serious challenge, because the two sets of eyes happen to be attached to two different brains. Before I elaborate on this challenge, though, I’d like to explain what this recording of annotations involved.

There are two separate finding aids for the Scott collection. I only found out about the second because while working in the Rare Books Room one afternoon, I happened to ask to see a hardcopy of the finding aid I’d been working with (I had a scan, but it was not entirely legible). Scott’s books were donated to McGill in two groups: the contents of his Law Faculty office, which were given to the library shortly after his death; and his personal library from his home, which his widow donated in 1988. The first finding aid – the one I had from the beginning, which seems to correspond to  Scott’s personal library — was drawn up in 1990, and there is a manuscript note on its title page, indicating that the list needs revision but is complete. For reasons that remain obscure, a second finding aid was drawn up in 1994; it, too, appears to contain books from the second donation, and makes no reference to a third donation (which would easily explain the new finding aid, but which I have no reason to believe occurred). There does not appear to be a finding aid for the  Law office books; they are now sitting in boxes, uncatalogued, in the Rare Books room, because the Law Library decided they weren’t interested in them.

I have offered this detailed account of the collection’s provenance because it is the clearest way to explain the organizational challenges I was facing. Both finding aids contained material that I was being asked to examine, but they were organized differently: the first was grouped by genre and language, and organized alphabetically within those groups, making it relatively easy to pick out the English-language poetry. The second was strictly alphabetical, so that unless I was familiar with the author or title, it was difficult to know whether a work was poetry or prose. To complicate the matter further, Dean had requested that poetry from before 1880 be excluded from my search – this narrowed the field, but made it a bit difficult to spot unfamiliar early works which Scott owned in modern editions. Before I could even begin looking at the books, this data had to be sorted and organized. No mean feat, particularly when I began with documents that had been scanned – OCR, as we’re all aware, is imperfect enough to cause serious aggravation.

And this is where I learned that I have a lot to learn about being a digital humanist. I recorded my findings in Word, because that is what I am used to using to do my work. Not only was it difficult to turn the scanned document into a pretty Word document – it was also tough to organize the entries in Word. In hindsight, it would have been much better to work in Excel. Doing so would have prompted me to think harder about how the data would be used and could be sorted: in retrospect, I envision columns indicating whether there are annotations, ephemera, Scott’s signature, inscriptions from others, and so on, in addition to the kind of discursive analysis that I provided. This raises the important matter of considering, from the get-go, the best tools for the job and the best ways to use those tools.

A related problem, as I mentioned above, is the difficulty of being another person’s eyes. Although I knew the purpose of the information I was gathering, it was difficult for me to know what was important information, so I took note of almost everything. Again, in retrospect, I’m pretty sure that taking note of the inscriptions to Scott, particularly in books that were otherwise un-annotated, was not an effective use of my time. Ultimately, I spent 10 hours this summer creating an Excel sheet that contained almost everything but the inscriptions. Such retrospective claims are intended as a reminder to me—and I hope to you—of the importance of thinking through what information we really want to gather, and how it should be organized. (I do, however, have a complete record of who wrote what in Scott’s books of poetry… you know, just in case.) Naturally, there are also things I might have taken note of that I didn’t: I did not regularly take note of the condition of the book, but only noted particularly interesting cases. I can imagine that know whether a book looked read or not might, in the end, be helpful in deciding whether a perceived allusion to something in that book was real.

A final point about data sharing: though I was sometimes able to provide a discursive account of an annotation, often it was necessary to have the pertinent page (or the ephemera) scanned. McGill’s Rare Books room does not have self-service scanning; each scan costs 25 cents, there is a limit of 10 scans per book, and a detailed scan form had to be completed to order the scan. Each image is returned as its own unique jpg with an incomprehensible numeric file name. So, when I had multiple scans from the same book, I needed to convert them to pdfs, combine them, and re-name the file so that it could be clearly associated with the book which it represented. Then, I had to upload those files to google docs, so they could be shared with Dean. All of this made for a really cumbersome process – one which I hope it might be possible to streamline should a similar case arise in the future. Perhaps it’s even possible to do this better now, but I’m just not aware of the tools to do it.

So… that’s what I’ve been up to. I hope that this narrative will be useful in helping other members of this community think about how we gather, organize, store, and share information. If there are tools out there that would have helped with this work, I’d love to hear about them… and I’d really like to hear your ideas about how we can plan a project so that we are using the available tools to their best advantage from the get-go. And in the meantime, I’m looking forward to signing up for more swimming lessons.

June 7, 2010

My New Word of the Day: Prosopography

Yes, my friends, I have learned a new word this afternoon.

Prosopography.  Check it:


June 7, 2010

Initial Reflections from Day One: Lunch

Here it is, lunchtime on day one of the DHSI.  As I happily munch on lunch with my fellow roommates, I feel a tinge of jealousy that I can’t retake the TEI Fundamentals course.  This year, we are lucky enough to have 14 of the 19 EMiC participants enrolled in that class.  Having that large a group to commiserate with is very helpful at the early stages of learning a new language.  As P.K. Page struggles and goes silent because of the overwhelming nature of learning Portuguese in Brazil, so I too struggled with learning a language of angle brackets and abbreviations that has been a bit suppressed since my last visit to Victoria.

Returning now with a fresh face, I feel re-engaged with the digital tools.  My new course, Transcribing Primary Sources, is much more invested in the bibliographic and social text features of the text.  Matt just spent half an hour talking about all the ways you can describe the scribes who wrote the text and how to mark specific regional geography to “map” the transmission of the text.  How awesome is that?

Because lunch is fast wrapping up, the last piece of news I want to share is about our afternoon project.  I am doing digital mark-up fill in the blank!  This guy definitely understands my abilities.  I get to go hunting for the right information, but I also have the safety blanket of knowing that in this case there is a “right answer” which I can try to find.

Back to work, and I can’t wait to talk (and read!) about your experiences at DEMiC today!