January 20, 2011

In Search of a Digital Humanities Repository

Before I became the newest EMiC Postdoctoral Fellow this past fall, I regularly discussed with my colleagues the lack of a simple editing and publication engine for Digital Humanities scholars and teachers. My field of research, modernist Periodical Studies, is rapidly expanding, and the digital environment promises new ways of archiving and accessing magazines that have been scattered across university libraries around the world. Organizations like the Modernist Journals Project are doing wonders in delivering complete digital editions of periodicals to scholars, but there is no place a professor can go to teach a student how to digitize, OCR (Optical Character Recognition), markup, and publish a magazine or book for a class project. This was a major problem for David Earle at the University of West Florida who was teaching an undergraduate section on modernist magazines and wanted his students to produce a digital edition at the end of the course. Earle realized that he had to negotiate a complex field of proprietary software and web expertise to make his course viable. With a bit of elbow grease, Earle started, with his students, “The Virtual Newsstand from the Summer of 1925.” His class was asked to help “recreate” a 1920s American newsstand—that is, what magazines and papers would the average New Yorker have seen in one of the little kiosks on a warm summer afternoon in 1925? As you can see, the project was a great success, and I hope it is something we can help our EMiC team do too in the classroom.

My primary task this term has been to set up the EMiC Digital Coop and Digital Commons. The Coop will be a closed repository where you will be able to upload everything you have scanned for the EMiC community. The Commons will be the place where you can publish your own digital editions. This will be a public space, so only material that is in the public domain, or material with which you have secured rights, may be published here. I’ve had two questions in mind: what type of system can we use that will be easy to use for the ingestion of material to the EMiC repository, and what system can we use to publish that material once it is ready? We also want to make sure that our repository uses the best open archival practices available to us today. This ensures that EMiC (your work and mine) will be compatible with other university systems and repositories for many years to come; for example, Susan Brown is in the process of creating the CWRC (Canadian Writing Research Collaboratory, or “quirk”), which promises to be one of the greatest archives in Canada once it goes online. Brown will be building the CWRC on the Fedora Commons framework at the University of Alberta, and thus it is important for EMiC to be able to create a repository that will work well with this future archive. To this end, we have decided to build our repository on the Fedora Commons framework as well.

Now that we have a framework, how do we create a system that is convenient and easy-to-use? This has proven to be a very difficult question. As many of you know, the Center for New Media and History at George Mason University has released a powerful publication and exhibition tool called Omeka. Though Omeka is a powerful tool (and it only promises to become even more powerful), it does not provide all the tools we need to run an agile and powerful repository. After much research, I came across Islandora, a content management system created at the University of Prince Edward Island built upon Drupal. Islandora provides us with an easy-to-use system that allows us to upload an image file and have an automated workflow create OCR, PDF and XML files (including TEI) upon ingestion. The team at PEI, including Mark Leggott, Donald Moses, Joe Velaidum, and Kirsta Staplefeldt are committed to building open tools for the digital humanities community at large (and they have a digitization lab to die for). We are very impressed with their scholarly model, and we hope that they will use their experience with EMiC to collaboratively build a repository specifically geared towards digital humanists (Islandora is already hard at work in museums and universities around the world).

But before we commit to a system, we need to run vigourous tests to make sure the system we build for EMiC will last long into the future. In order to ensure this, Dean Irvine has allocated funds for a study into Islandora and Omeka at the University of Alberta for EMiC use (and if all goes well, perhaps for the CWRC  as well). By the end of January, EMiC should be able to announce our findings. Our goal is to provide a complete editing and publication engine not only for our community, but the world at large as well. How will this happen when we have great tools like TILE, Omeka, Islandora, which weren’t built specifically to work with one another?

As many of you know, Meagan Timney, the other Postdoctoral Fellow at EMiC, is a talented programmer and teacher committed to the Digital Humanities (and I’m told, she is also the person who championed the idea of EMiC before it was even a proposal in the Director’s eye). She has agreed to code the necessary APIs to allow our new system to work with Fedora Commons (and thus Islandora) and Omeka. Her work will provide the Digital Humanist community and important plugin so users of Omeka and Islandora will be able to edit images on the web and in the repository. For those of you attending DHSI this summer at the University of Victoria, she will be teaching a course on “Digital Editions” (http://editingmodernism.ca/training/summer-institutes/demic/) where students will get to use this new tool in creating their own digital work. I encourage you to sign up for her course if you would like to learn how to digitize, edit, and publish a text to the web.

So, where does this leave us? Well, we hope that by the end of spring someone like David Earle will never have to look for an editing and publication engine ever again. This also means that we will be ready to start ingesting the material you have all been scanning directly into the repository. We hope EMiC will provide our community with the archive and tools it needs to start producing the texts you want to create from your various archives. We are truly on the cusp of creating an entire framework that will help scholars around the world produce and edit texts that will be nurtured in an open-source and secure repository for many years to come.

