Editing Modernism in Canada


June 23, 2010

My DHSI project and experience

Now that I’ve finally made it home from DHSI, and had a chance to process the week, I thought I’d share what I was working on in my course, and my thoughts that came out of it.

I was in the TEI fundamentals course, and as I am not currently working on my own EMiC project but working as an R.A. for Zailig on the PK Page project, I decided to mark-up a text that interests me. I ended up choosing the opening page to Douglas Coupland’s seminal novel Generation X.

First page of Coupland's novel "Generation X"

I chose this text for a number of reasons. First, it is one of the main texts for my thesis, which means that I had it with me. But secondly, and more importantly, I chose it because I wanted to figure out how to encode the interesting and unusual non-textual elements on the page. Primarily, the fact that the second paragraph break (line 10) isn’t represented by a line break followed by an indentation. Instead, it is simply notated with the symbol for a new paragraph (¶). (This only happens on the first page of every new chapter, not throughout the whole text). Secondly, I was interested in the photo that is inset into the text, breaking up the text, and even causing a soft-hyphen on the word “transportation.” To me, these two features are essential to the text.

After my DHSI course, however, this is the coding that resulted for this introductory paragraph.

During my course, I asked the professors how I could go about coding these two non-textual elements. The short answer was “You can’t.” TEI is only meant to encode the text itself, and the professors explained that these elements were not part of the text, and therefore had no reason to be encoded. When I pushed them, they suggested that I employ the ‘rend’ attribute to solve my first problem—that of the ¶ symbol. As you will notice in my image, the line that is highlighted contains a new paragraph, with the rend attributes “¶,” “notes-new-paragraph,” and “no line break”. You will also notice that the ¶ symbol does not appear in the text itself, but is simply hidden away in the code. Their argument was that this symbol clearly indicates a new paragraph, and nothing more. While I strongly disagree, I left it as they suggested.

As for the second problem, the professors suggested that I simply pretend that the image doesn’t exist, and remove the soft-hypen from the text. I could always link to a scan of the page, where avid readers could discover the image inset into the text on their own. As you can see from my coding, I decided to leave the soft-hypen in the word “transportation,” but I did not find a satisfactory way to represent that the text was being interrupted by an image.

Although this is a very short text, it taught me a number of lessons about TEI, and about what is currently possible with digitized texts.

1)    TEI is, first and foremost, about encoding the ‘text’. While it is possible to encode non-textual elements, there is no agreed upon method. TEI has a very clear hierarchy.

2)    Encoding a text in TEI, like all forms of editing, is subjective. It can be deceiving, because there are strict rules on what is and what is not allowed, etc. But in the end, it is not scientific, nor purely objective.

3)    Every editor should strive to make his or her editorial method as clear and as transparent as possible. I am not convinced that TEI encoding explicitly allows for this.

4)    A good IMT is definitely needed to support the design and layout issues that are not supported by TEI encoding alone. Although I am wary of simply telling readers that everything that isn’t in the TEI encoding can be found in an accompanying marked-up image, it is better than just a TEI encoded text.

So there are my few thoughts. I am anxious to see other EMiC projects as they develop, to see if anyone else encounter similar issues, and how they are resolved.


