A Macron Signifying Nothing: Revisiting The Canterbury Tales Project Transcription Guidelines

The original transcription guidelines of The Canterbury Tales Project were first developed by Peter Robinson and Elizabeth Solopova in 1993. Since then, the project has evolved and expanded in scope, bringing about numerous changes of varying degrees to the process of transcription. In this article, we revisit those original guidelines and the principles and aims that informed them and offer a rationale for changes in our transcription practice. We build upon Robinson and Solopova’s assertion that transcription is a fundamentally interpretive act of translation from one semiotic system to another and explore the implications and biases of our own position (e.g. how our interest in literature prioritizes the minutiae of text over certain features of the document). We reevaluate the original transcription guidelines in relation to the changes in our practice as a means of clarifying our own position. Changes in our practice illustrate how the project has adapted to accommodate both necessary compromises and more efficient practices that better reflect the original principles and aims first laid down by Robinson and Solopova. We provide practical examples that demonstrate those same principles in action as part of the transcription guidelines followed by transcribers working on The Canterbury Tales Project . Rather than perceiving this project as producing a definitive transcription of The Canterbury Tales, we conceptualize our work as an open access resource that will aid others in producing their own editions as we have done the heavy lifting of providing a base text.

Introduction abbreviation (PPEA "Transcriptional Protocols"). Our own transcriptions, in the interest of transparency and accessibility, encode abbreviations markers in the <am> element and our interpretation of their corresponding expansions in the <ex> element. Our treatment of the same example would render as "<am>ꝓ</am><ex>pro</ex>pt<am></am><ex>er</ex>" allowing the reader to see how we arrived at our interpretation of the text.
The CTP has also developed many facets of its transcription practice from the innovations of other full text transcription projects. Our use of the apparatus (<app>) tag is explicitly modelled on, and extends from, the system used by The International Greek New Testament Project and its collaborators (see p. [13][14]. We also borrow from Prue Shaw's edition of Dante's Commedia for our version of the <app> tag and draw inspiration from the same for many of our decisions regarding abbreviations (see p. 17).

Encoding Physical Features in Addition to our own Hierarchical Structures
As we translate from one semiotic system to another through the process of transcription we sometimes create redundancies and doubly encode hierarchies present in the primary source witness. This is the result of needing to encode divisions between sections (e.g. tales and links) in machine-readable language while still recording the textual signifiers (e.g. rubrics) meant to convey those same divisions to human readers. For instance, our transcription encodes boundaries between sections of the text through the use of the <div> tag but also retains rubrics that serve the same purpose in manuscript witnesses. In Figure 1, we can see the rubric clearly denoting a separation between the end of The Squire's Tale and the beginning of The Franklin's Prologue. However, even if this rubric were not present, we would still need a machine-readable way to record this transition for the sake of our collation.
As a result, we record this transition as the end of one division (</div>) and the beginning of another in machine-readable XML (in this case, <div n="L20">, to denote the beginning of Link 20). However, we want to retain that the scribe's own practice in the primary document also noted this separation, not only because it informs the medieval reader of distinct sections of the text but because such rubrics can indicate the relationships between textual witnesses. To this end, our transcription ( Figure 2) also encodes portions of the single rubric as present as an explicit at the end of the Squire's tale and as an incipit at the beginning of The Franklin's Prologue.
This constitutes encoding the same information-that one section ends and another begins-in two distinct ways: at the level of the text and the level of the document. One explicitly captures this division as machine-readable while the other is a human-readable representation of the semantic statement of the division in the primary source document.
Our system offers readers "diplomatic" and "edited" views of our digital transcriptions, but our use of these terms requires some qualification. We do not use "diplomatic" in the strict sense Elena Pierazzo puts forward, where a transcription is diplomatic which: reproduces as many characteristics of the transcribed document (the diploma) as allowed by the characters used in modern print. It includes features like line breaks, page breaks, abbreviations and differentiated letter shapes (2011,. We use "diplomatic" in a more qualified sense. Although our transcriptions include line breaks, page breaks, and abbreviations, our aim is not to reproduce the characteristics of the document as nearly as our medium allows. In our practice, a certain degree of regularization takes place at the level of transcription so that, for instance, we do not differentiate between distinct letter forms (excepting capitalization, discussed below). Similarly, we use only a limited set of abbreviation marks rather than all characters potentially available for the sake of an economy of effort in handling the inconsistency of scribal practice. Our use of "diplomatic" is perhaps best understood, therefore, in contrast with "edited" in the context of the viewer in our Textual Communities platform; the principal difference is that in our "diplomatic" view, abbreviations are represented by abbreviation marks, while in our "edited" view those marks are replaced by their expansions shown in italics.
Another example of incorporating physical features of the document into our encoding involves marginalia of various sorts. We use distinct tags for different types of marginalia so that we both record marginal text and encode its perceived function on the manuscript page. The most common example is the running header. Here, " ¶ Chawcer" signals that the text below belongs to Chaucer-the-character's Tale of Sir Thopas, which we encode as: <fw type="header" place="tm"> ¶ Chawcer</fw>. Here the type "header" indicates that the contained text functions as the page's header and "tm" signifies its location at the top middle of the page.
In addition to headers, we encode footers, catchwords, manuscript signatures, and page numbers, each with its unique tag. Marginalia that does not fit these types we record in a <note> element with an indication of its location. One exception is the paraph mark ¶ which we encode at the beginning of the line, rather than as marginalia.
Similar issues arise when editing The Tale of Sir Thopas where an encoded version of its tail rhymes can hardly reflect the cues expressed in the document. In many manuscripts, this tale is written with offset tail rhymes linked together by brackets as can be seen below in Figure  Here, the brackets act as a visual cue to inform the order in which one reads the text. Our intention, however, is not to reproduce the visual effect of the document but rather to encode our interpretation of it and to translate its hierarchical structure into our digital system. Accordingly, we encode these passages as if they were written in consecutive order from top to bottom. In this instance, therefore, the line beginning with "His spere" is recorded first, that beginning with "The heed" is third, while "In londe" is seventh. We freely admit that something of this complex visual structure is necessarily lost when translated into our digital form, but what is lost is incidental to our primary aim of collation.

Transcription as an Ongoing Process
Many of our transcription practices have changed since the original guidelines were documented in the early stages of the project. Some of these have come about through an organic process of refinement (we hope) resulting from an accumulation of experience and, especially, from being forced to find solutions for an abundance of unforeseen challenges. Others, however, are directly related to changes in our technological capabilities. Still others have been changed for more pragmatic reasons in our aim to achieve an economy of effort. The following discussion will address a number of these changes and offer our rationale for employing them.
One particularly significant change to our transcription practice involves the development of the apparatus (app) tag, which will require some background to illustrate. First, there is the problem that the app tag is meant to solve, namely, "how do we encode scribal alterations?" Here document in a manner that is as interpretation-free as possible while the interpretive act of understanding the distinct readings of that text is treated separately in <orig> and <c1> elements, where the number indicates which scribe is responsible for the correction (Bordalejo 2013). This system has been only slightly adjusted for the CTP in that we use the more clearer <mod> (modified) element, while most of the time it is not possible to distinguish between multiple correcting hands.
A particularly interesting example of an <app> tag in our project is given in Figure 5, below and is encoded as follows: <app> <rdg type="lit">sort<seg rend="int">une</seg></rdg> <rdg type="orig">sort</rdg> <rdg type="mod">fortune</rdg> </app> Here, the <lit> element represents the text of the document regardless of which reading it is judged to belong to, while the <orig> (original) and <mod> (modified) elements contain the transcriber's interpretation of the variant states of the text, namely, what the scribe originally wrote (<orig>) and what the transcriber understands the correction to signify (<mod>). One complicating factor in the example above is the initial letter which, although it undergoes no physical alteration, is nevertheless understood to change in meaning. The modified reading of the line in our transcription question renders as: "Were it by auenture, fortune, or caas" (Textual Communities, Ne 12v). The context of the line informs our interpretation of both readings. The Middle English word "sort", meaning "fortune" or "lot", makes far more sense than "fort" in the original reading and corresponds logically with the scribal modification to "fortune" ("Sort, n.1" 2020). The reading "fortune" is also a much more likely reading in general than the awkward "sortune"-if the final "une" were not an interlinear addition, most readers would doubtless not hesitate in understanding "fortune." The judgment to understand "fortune" here is further corroborated by the scribe's occasional inconsistency in crossing the letters "f" and long "s." Indeed, on the very next line we see evidence of this in Figure 6, where the initial "s" of "soth" has a more emphatic cross than many instances of the scribe's "f." Whether or not one agrees with our judgments concerning the readings "sort" and "fortune," the example above highlights the advantage of using our form of the <app> tag. Rather than merely offering a set of interpretive judgments in isolation, the <app> tag instantiates our value of transparency. It provides readers with the information our judgments are based on as well as the means to evaluate those judgments-and to disagree should they see fit to do so. The <app> tag signals to the reader that something interesting is happening in the document and emphasizes that what we transcribe is an interpretation (Bordalejo 2013).
We do not use <app> tags in all cases of scribal alteration, but only for those which introduce a substantially variant reading or in cases where we wish to highlight instances of variation. This is done for the sake of an economy of effort and to avoid potential complications during collation, but most importantly because doing otherwise could constitute a misunderstanding of the text. For example, an underdotted false start of a single letter is clearly not to be understood as an alternate reading of the following word. Therefore, it is simply encoded within a <hi rend="ud"> tag, which is easily recognizable during collation. The following (Figure 7) is an example taken from the Lansdowne manuscript: This passage is transcribed: Vpon hire <hi rend="ud">h</hi> chere-without an <app> tag. To use an <app> tag in such a case would be to treat the underdotted "h" as a variant reading of "chere" or as an altogether separate word which the scribe began to write, but never fully executed.
Despite its strengths, we recognize that the <app> tag is not perfect. One issue is that in our system nothing binds together all examples of specific types of scribal alteration. For example, though we record the physical marks of underdotting, overwriting, erasure, and strikethroughwith specific tags within the <lit> element, these activities are not explicitly categorized together as deletions within our transcription. As a result, there is no direct way to search for all instances of deletion; one would instead need to search separately for instances of underdotting, strikethrough, etc. This would not be the case if we followed the TEI recommendation of using the <add> and <del> elements, but we maintain that addition and deletion are construed by the reader and not features of the document. It is therefore fitting that they should remain at the level of interpretation for our readers as well and not be features of our transcription. Each <app> tag is to be individually interpreted in order to understand the change it encodes. This reasoning is sound, but it reflects a reliance on a particular feature of the TEI guidelines that we no longer strictly adhere to. The statement that scribal inconsistency "forbade certain assignment of any one phonetic value to any one sign" is particularly telling. It points to the recommendation in the TEI guidelines to create a list of character definitions for each brevigraph (TEI Consortium, 11.3.1.2), but it is precisely this fundamental step that is impossible to achieve for such a multivarious and inconsistently used set of brevigraphs. If this task was unmanageable for transcribing only The Wife of Bath's Prologue, how much more problematic would it be when of Cp and Ha4 supports the argument that the two manuscripts are written by the one scribe," carrying out such detailed work for the entire corpus of witnesses is a monumental task (Robinson and Solopova 42).
Developing such descriptions and guidelines for each manuscript takes valuable time and resources, especially in light of how little such a distinction matters during the process of collation itself. As a result, the current project guidelines regarding capitalization in verse have been streamlined: We transcribe capitals when the letter form in the manuscript is emphatic, that is, different from the regular lower case letter. However, we always transcribe a capital at the beginning of the line, whether the letter is upper or lower case ("Capitalization" Bordalejo and Robinson).
While this might seem an extreme change, it clearly benefits the economy of effort on the project.
First, there is little to no practical change to the outcome of collation, a primary end of our transcription. Second, not only do project leaders not have to expend time and energy on these guidelines, transcribers (whether paid or volunteer) no longer need to learn the practices for each manuscript. The confusion of manuscript-specific guidelines almost certainly costs the project in errors as well as time.
The more straightforward and consistent we can make our transcription practices, the better chance we have to make fewer errors and to spend less time clarifying those rules to transcribers. Incorporating an economy of effort into our transcription guidelines, though it may sometimes require us to give up a particular level of specificity, better serves the project as a whole without compromising the transcription and collation processes. Difficult as it may be, we have to determine what is practically best for the project in terms of time and resources. This entails aiming for a sound base transcription that sets the stage for many potentially interesting projects and making our transcription freely accessible for others to pick up the project's work and adapt it for research of a greater specificity.

Improvisation and the Need for Flexibility
While we do our best to provide clear instructions for the treatment of uncommon circumstances in transcription, there are certain cases where we must depend on the transcriber's ability to discern and interpret the manuscript without relying upon rigid guidelines. Sometimes, this is because we cannot anticipate every permutation of an abbreviation or the inconsistent ways in which particular scribes adapt conventions of abbreviation. For instance, there are relatively standard expansions for macrons such as an "m" or "n". However, a scribe might use a macron to abbreviate many different letters and often context is the best way to interpret the appropriate transcription. The same is true with how scribes record sacred names. It is often quite clear that the intended name is "Jesus" or "Jerusalem" but the abbreviated forms scribes might use vary.
Rather than develop rigid and byzantine guidelines on every question of interpretation specific to each manuscript, we instead rely upon the transcriber to interpret the manuscript based upon their own experience with that manuscript and others, the context of the particular passage they are transcribing, and their knowledge of the guidelines we do have in place. If a transcriber still cannot make a decision on a unique case, they can raise the problem with the project leaders, who will make a judgment based on our transcription principles and aims. The structure of our transcription team also ensures that senior transcribers and project leaders review these unique cases, and indeed all transcriptions. The following examples demonstrate certain instances where relying upon improvisation is preferable to a more complete guideline structure.
Perhaps one of the best instances of this need for flexibility and improvisation is the abbreviation of sacred names. The ubiquity of nomina sacra such as "Jesus", "Christ", or "Jerusalem", and perhaps the fact that they do not look particularly similar to other names, results in various configurations of their abbreviation.
For example, the name of Jesus, often rendered as "Ihesu" by medieval scribes, can have a number of spellings and abbreviations. Originally, our guidelines recorded the many instances we found because of Robinson and Solopova's interest in the use of the macron in abbreviated forms of "Ihesu". Thus, abbreviations such as "Iħus", "Iħu ", "iħu", and "iħc" are explicitly mentioned in the original guidelines (Robinson and Solopova 32). Abbreviations can even feature Greek letters, as in "xħu" for Ihesu or "xp" as an abbreviation for Christ. In our current full transcription guidelines, we record instances such as these in a section of "more complex abbreviations" (Bordalejo and Robinson). We also record some of the most common forms of sacred names in our "Quick Start Guide." This includes "Iħu", "Ierlm̄", "xpō", "eccliāste", and "dd" as abbreviations for "Ihesu", "Ierusalem", "Christo", "ecclesiaste", and "Dauid". However, unlike The Cotton Nero A.x.
Project that documents every instance of a unique abbreviation of a nomina sacra-for instance, they encode seven distinct abbreviated forms of "Jerusalem"-that appears in the manuscript, we opt to record only the most common instances and trust that our transcribers can interpret distinct Our aim is to give the transcriber a large enough sample of the abbreviations so that they might recognize any new formulations a scribe presents and understand how to correctly encode them. The guidelines do not record every iteration or permutation of even the most common nomina sacra. Instead, they rely upon the transcriber to discern the best way to encode these commonly abbreviated words when the scribe has recorded them in an unorthodox manner.

Conclusion
The many changes in the CTP's transcription practices detailed above are both a reinvestment in its foundational principles and a perspective on the continuing development the project has experienced over the last twenty-eight years. But it is important to acknowledge that even the work of a project as long-standing as the CTP is only a stepping stone to further scholarship. In fact, Robinson reinforced this fundamental principle of the guidelines again and again. In his article, "What Text Really is Not", he frames the goals of the project as a means to give readers access to new texts that they can make themselves: Our aim over the next decades is to transform the way people read the Canterbury Tales.
We want readers to understand just what it is they are reading, so that they can make texts for themselves with new intelligence. If they do this, then our text will be outdated: and frankly, it will not matter then if people can no longer read our text, and I will not care if they cannot. The great promise of electronic editions, to me, is not that we will find new ways of storing vast amounts of information. It is that we will find new ways of presenting this to readers, so that they may be better readers (Robinson 1997, 50).
Robinson acknowledges the ephemerality of our text and freely gives it up as something that will quickly become outdated once others have access to the CTP's transcriptions. Yet even these transcriptions are not safe, as Robinson predicts an interest in greater levels of specificity and the obsolescence even of the monumental task and resource of the CTP's transcription: In 100 years time, scholars will be interested in these different letter forms, and will want transcriptions which record them. Our transcripts will be outdated and of no interest to anyone except the occasional digger into archives (Robinson 1997, 50).
And yet, the contribution of the project remains clear: the transcription is a means to more informed texts, better quality editions, and, eventually, more detailed transcriptions. It has been and continues to be a task that requires tremendous collaboration and effort in service to a wider community of Chaucer scholars and editors. Paradoxically, our task is to be overcome: the ideal end of our entire project is its own obsolescence in the wake of future projects made by those who have become better readers.