A deceptive simplicity

§ 1 Producing a TEI-conformant diplomatic edition of Huntington Library manuscript Hm 114 (Ht) under the aegis of the already TEI-conformant Piers Plowman Electronic Archive (PPEA) would seem to be a simple enough task. Such an edition would, after all, be produced as part of a larger project for which many tools had already been been developed, guidelines written, and much experience gained. Presumably all one would need to do would be to acquire high quality color digital images, transcribe the text, mark it up according to the TEI and PPEA guidelines, proof against the manuscript, write up a standard description, and be done.

§ 2 Actual practice, however, proved far more difficult. Ht is a rogue manuscript whose non-standard features disrupt even some of the simplest applications of standard TEI and PPEA conventions. Even something as simple as the numbering of lines proved fraught with difficulty.

§ 3 As a result, the Ht diplomatic edition project was transformed as it unfolded from a straightforward attempt at the modeling and documentation of an interesting artifact into a testbed for new methods of markup and a new conceptualization of how manuscript evidence could be brought to light, while at the same time it was constantly subjected—with greater or lesser degrees of hopefulness and success as the work went on—to the demands for conformance made by the Piers Plowman Electronic Archive, TEI P4, and the emerging TEI P5 recommendation.

§ 4 The lessons learned provide a means of opening a dialogue on the experimental use of markup, as well as on the conditions, both social and technical, under which an editor of average and frequently self-acquired technical skills might make a start at exploring issues for which recommendations have not already been made, without condemning his or her project to a thoroughgoing idiosyncrasy.

Huntington Library, Hm 114 (Ht)

§ 5 Ht is a highly eccentric manuscript, with contaminations at all levels from exemplars related to the B-text tradition of Piers, the A- and C-text traditions, spurious lines that are probably of the scribe's own invention, and interpolations from sources other than Piers.^[1] Certainly, Ht has not been the fair-haired child of Piers Plowman textual critics. George Kane and E. Talbot Donaldson chose to keep the manuscript's readings out of the apparatus to their edition of the B-text because of the very features that make it an interesting test of the limits of current markup standards. Their estimate that [i]nclusion of its variants would more than double the size of the critical apparatus (Kane and Donaldson 1988, 15) is incorrect perhaps only in being an underestimate. Their concurrence with W. W. Skeat that Ht is one of those MSS. which are best avoided (Kane and Donaldson 1988, 14-15; Skeat 1873, xx n.) is unarguable from the point of view of an editor whose goal is the restoration of archetypal readings by recension.

§ 6 From the very outset then, an edition of Ht could hardly be undertaken as part of the standard work of the PPEA project, which is devoted to applying electronic editing technologies to the analysis of A-, B-, and C-text manuscripts in order to illuminate their relationships as the basis for constructing recensional archetypes (Duggan 1993, 68-69, and Duggan 1994; Adams 2000 and Adams 2002, 123). The chief interest of Ht or any other eccentric manuscript within the scope of the overall PPEA project lies in its unusual scribal activity and the evidence it affords of the free combination of various genetic groups in the Piers tradition. It might afford some insight into what states of the Piers Plowman text and what range of exemplars were available to a typical early fifteenth-century scribe, even if Ht would be likely only on occasion to have preserved an archetypal reading not witnessed by another surviving manuscript. A published electronic edition of the manuscript would additionally provide textual critics and fifteenth-century specialists with a model and facsimile of an artifact constructed by an undeniably avid reader of Piers in all its forms, working in the generation after Langland's death.^[2]

Ht and the PPEA

§ 7 In their thorough examination of the manuscript, George Russell and Venetia Nathan suggest that Ht holds interest not because of its value for recension, but because it contains, apparently, a carefully edited version of the poem made by one who had before him all three texts of the poem and who sought to produce from their conflation a composite version which would incorporate what he regarded as the best material from all three (Russell and Nathan 1963, 119). This assessment of the manuscript's historical significance, made more than forty years ago, has been met by a growing consensus of subsequent scholars (e.g. Seymour 1974, Scase 1987, and Hanna 1989 and 1996 ) who have recognized the importance of the study of the Ht scribe's activity for its own sake.

§ 8 Because this rogue manuscript could not simply be edited as a rogue project if its results were to have any relevance to the overall PPEA or to the larger scholarly world, it became a test bed for best (and sometimes worst) practices in extension of PPEA and TEI guidelines, as well as an exercise in what can be accomplished by a scholar of only average technical abilities—someone trained almost exclusively in the humanities rather than computer or information science—who lacks institutional support,^[3] but who nevertheless must learn the necessary technologies on the fly because salient features of the document artifact in question seem to be comprehensible only by the application of digital technologies.

§ 9 The measures actually taken, then, to see the project to completion have been a combination of the rigorous application of PPEA and TEI standards wherever possible, with the shameless application of technical duct tape whenever the project hit upon a sort of data or analytical juncture that could not be handled, or not at all well handled, by current standards and available software. Some of these adaptations might have been designed and applied more elegantly—perhaps even within current guidelines—by an editor with greater technical skills, but there was, simply put, no such editor present.

§ 10 Since this is a routine condition of actual electronic editing, the insights gained in the course of the otherwise eccentric Ht project may well have a very wide applicability. Ideally, scholars encountering such difficulties should develop conventions for resolving them that would, over time, be productive of collaborative extended markup standards built inductively upon manuscript evidence—certainly not in real-time, and not without a great effort of communication and cooperation, but nevertheless in accord with new phenomena as they emerge from close examination of the manuscripts themselves.

Inductive markup: analytical proto-markup

§ 11 TEI recommendations and the PPEA protocols based on them tend to presuppose that manuscript features will fall into categories that are already well known and agreed upon in advance. Markup is seen more as a means of display and delivery of information of already known significance than as a tool for gradually discovering significant patterns as they emerge, based on reproducible and falsifiable observation and inference.

§ 12 This is no reproach at all to the massive and very successful efforts of the TEI to provide a standard for the encoding and interchange of text. Rather, it is to say that under certain conditions a means needs to be sought in which the values of rigor and uniformity espoused by the TEI can be adapted to a necessarily experimental project, the boundaries of which may not be fully understood until all or most of the project has been completed (see Cummings 2006 for an example).

§ 13 Some of the extraordinary measures taken in the course of the Ht project—measures considered to be only provisional prior to final markup—represent the first stage of such an adaptation, what might be thought of as analytical proto-markup.

§ 14 Ht presented at many points a mere pile of indeterminate data, the full character and significance of which was neither known nor knowable in advance by any simple inspection that could be done at the time of initial markup. Even perfectly TEI-conformant markup would have to be applied in a speculative way as a means of preliminary analysis. That is, such markup would have to be used to encode portions of text or manuscript features as a sort of place holder rather than as a clear-cut intellectual category of known significance. Under such conditions, non-standard forms of markup, when rigorously applied, can unveil patterns that would otherwise remain hidden or identify phenomena of still unknown significance for future analysis and aggregation.

Line numbering: an unexpected test

§ 15 A most unexpected instance of this situation arose immediately after the initial transcription of Ht, at the stage in which line numbers are typically assigned to manuscript transcriptions in the PPEA. In the PPEA, two sets of line numbers are assigned to each manuscript line: one, encoded on the id attribute, indicates the absolute position of the line in the relevant manuscript; and a second, usually assigned automatically by a PERL script to the n attribute, identifies the equivalent line in the relevant Athlone (print) edition. Thus for example, line 50 of the Ht Prologue is line 50 of the manuscript poem and can also be affiliated confidently with line 50 of the Athlone edition of the B version of the poem (indicated here by the sigil KDP):

<l id="HtP.50" n="KDP.50"> I sawe somme þat seid þei had soght seyntes </l>

§ 16 In most PPEA editions, assigning this canonical line number has proven to be a relatively straightforward task. In a typical Piers manuscript, the scribe copies his text from an exemplar or exemplars representing a single state of the poem—that is, an exemplar clearly in the A, the B or the C tradition—and the affiliation of individual lines is quite predictable. The main exceptions are the so-called splices, in which a manuscript combines large sections drawn from the A text with another large B- or C-text section.

§ 17 Ht, however, is different from these other witnesses in that its mixing of text from the A, B, and C traditions is thorough, unpredictable, and occurs at the level of long passages, single lines, half-lines, phrases, and even single words. As a result, any given line, word, or phrase in its 8620 lines might come from any of the three versions of the text, arbitrarily.^[4] Determining which tradition or traditions a given passage is most closely affiliated with can be accomplished only on the basis of a line-by-line comparison with the extant texts, largely in the hope of finding readings unique to the A, B, or C tradition that also appear in Ht. Localization of these variants—referred to hereafter as discriminant variants—then makes possible the discovery of the filiation of at least some of the Ht scribe's exemplars. Since many lines in the poem are found in substantively identical or near identical forms in more than one version, the affiliation of many passages can be made only tentatively on the basis of such discriminant variants in adjacent lines.

§ 18 This process obviously requires a more thorough collation than the preliminary scan that has thus far proved sufficient for all other manuscripts in the PPEA. As a result, at the earliest stage of the markup, the edition of Ht was able to identify passages only in terms of their absolute position—supplying a value for the id attribute, but leaving the intra-textual reference n empty:

<l id="HtP.1"> In a somer sesoun whan softe was þe sounne </l>

Collation by approximation

§ 19 The question then remained: how best to discover, if possible, the filiations of the exemplars of Ht? Since too few of the electronic transcriptions of the surviving Piers manuscripts had been fully proofed, each and every reading of Ht—amounting to about 88,000 words, together with an incalculable number of word-order and omission variants—would somehow have to be collated against all three Athlone texts (representing something over 200,000 words of comparative readings). Thereafter, Ht would also have to be collated against the not inconsiderable number of variants recorded only in the three Athlone apparatus—apparatus in which, for the reasons stated earlier, Ht itself does not appear. Doing either or both of these collations in a straightforward way, word by word and phrase by phrase, would have been prohibitively time-consuming and most likely would have abounded in error beyond any acceptable level.

§ 20 Were this to have been the only available method, it might have resulted in either scrapping the project, or giving up on using n to provide canonical references to the Athlone editions. Since such a measure would almost completely vitiate the goal of discerning the scribe's likely exemplars—one of the project's principal warrants—a simple method of discovery in easy stages was undertaken. Since completed and proofed transcriptions of all the surviving witnesses to Piers A, B, and C were not available for machine collation, the method had to be based on the Athlone print editions while minimizing wherever possible both work time and the multiplication of error.

Identifying parallel lines

§ 21 The first problem was to discover simply whether a given line appeared in a single tradition, two traditions, or all three. The first line of the Prologue, cited above, appears in all three traditions, making it a line that might draw its variants from a manuscript related to any one of the other surviving manuscripts and fragments of the poem. Other lines (the first at HtP.7) appear in only two states of the text, greatly limiting the number of manuscripts to which Ht can be related at this stage in its text. Finally, and most usefully of all, there are those cases in which a whole line itself will be a discriminant variant, appearing in only one tradition, thereby narrowing the field of possible affiliation with surviving manuscripts significantly. In these cases alone (the first such in Ht at HtP.50) can a value be assigned with confidence to the n attribute as used by the PPEA without detailed collation:

<l id="HtP.50" n="KDP.50"> I sawe somme þat seid þei had soght seyntes </l>

§ 22 Certainly, marking up such lines, dubbed unaries, according to TEI and PPEA recommendations would prove useful for later analysis of the filiation of Ht, since any strong patterns of affiliation could be better relied upon not to have been the result of contamination from the other traditions. Nevertheless, if the number of such lines proved very small—as in the case of the A-text unaries which number only 143—one might wish to supplement this data set with those binaries and ternaries that proved to have discriminant word- or phrase-level variants that would identify them as affiliated with only one tradition.

Encoding the collations

§ 23 How might one record these sets and subsets in order to make them machine-readable without straying unduly from TEI and PPEA markup standards? Each of the three methods for constructing a reference system described in the TEI recommendation posed problems in the case of Ht. Simply declaring a reference system by the prose method (P4 5.3.5.1 and P4 6.9; P5 5.26.80 and P5 6.40), and then scattering the needed references to all three Athlone editions into various newly declared attributes in each <l> element would have required changing the SEENET DTD and would also have required keeping track of the self-defined data structure on the fly. The milestone method (P4 5.3.5.3 and P4 6.9.3; P5 5.26.80 and P5 6.40.105) presented the further drawbacks of clashing with the current use of <milestone> by the PPEA to encode folio breaks.

§ 24 The TEI <ref> element seemed a possibly good choice, since it defines a reference to another location in the current document, in terms of one or more identifiable elements, possibly modified by additional text or comment (P4 35 <ref> ; P5 35 <ref> ) and its model could be adapted to the data needed for study of Ht. Its use, however, would not be without some complications. Because this element, like the other two TEI-recommended methods of constructing a reference system, was designed as a means of encoding references to clearly known locations, it naturally does not provide for the use of a cert attribute. At this initial stage, however, most identifications could be only hypothetical, pending examination of the word- and phrase-level variants. Even when this examination would be finished, moreover, many lines—perhaps the majority—would remain forever ambiguous in that they would be found to contain no variants present in only one tradition of Piers.

§ 25 Such a reference system is really more a matrix of hypotheses at first, and, if one is lucky, a matrix of hypotheses together with somewhat more and somewhat less clear-cut determinations in the end. Since the TEI recommendation does not and was never intended to take into account such high levels of ambiguity, none of the suggested reference systems would accurately and completely encode what would be known (or more accurately, would not be known) about the filiation of binary or ternary lines at the point at which they had been indeed identified as binary or ternary, yet before their internal variants had been thoroughly examined to find the discriminants that, if present, could identify them with a single tradition or with an even smaller group of manuscripts.

§ 26 Adding a cert attribute to the recommended reference systems or to the <ref> element under P4, using the new <certainty> element of P5 (http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-certainty.html), or conducting a final collation of all variants would not resolve the issue. Even at the end of all the identification of variants it remains unclear what would be certain, and the degree to which it would be certain. What percentage of certainty could be assigned—that is to say with any rigor—to a ternary line with two B-text discriminant variants agreeing with Ht and another different variant from the C-text, but with none from A? Although a real statistical analysis of all the variants witnessed by surviving manuscripts set against Ht would be possible in theory, such an analysis would require that the whole work of identification of variants be completed and encoded for machine analysis first, exceeding by far the practical time and financial constraints on the project.

§ 27 As the recommendation for <certainty> in P5 states (http://www.tei-c.org/release/doc/tei-p5-doc/html/CE.html), the assignment of a level of probability using the new element, as with the cert attribute of P4, typically remains subjective. This is no problem at all in a project in which some dozens or even a few hundred instances would be marked up this way. The scale both in number of instances and in the time needed to complete each stage of the project presented by Ht, however, required a more categorical method, one that would be transparent at later stages of the work and also easily grasped by new minds—semi-skilled proof-readers, for example—grappling with decisions made in the course of collating thousands of lines across three versions of the poem perhaps a year or more after initial encoding. The more unambiguous such decisions could be made, the better.

§ 28 A practical method for indicating whether a line had zero, one, two, or three apparent parallels in the A-, B-, and C- text traditions was therefore developed for use on the fly during the line-level identification stage. This would serve as a guide at the later stage during which variants would be examined individually, based on the preliminary determination of their range of possible sources in the A-, B-, or C-texts. Such an approach would not guarantee that the correct determination would be made in every case, but it would make it much easier to discover whether or not it had.

§ 29 The n attribute of TEI <l> was already being used by the PPEA to record an absolutely certain reference to line numbers in the Athlone editions and thus was unavailable for a different use in the Ht. For legacy reasons, the project also hoped to avoid modifying the PPEA DTD in order to prevent the breaking of browsers and other software and to avoid as well the need to supply a special DTD—and a non-conformant one at that—for a single text within a larger collaborative project.

§ 30 A further, entirely practical, consideration at this stage was equally pressing. Some way of recording this data that would be very compact and easily referenced by eye as work progressed through the manuscript's 8620 lines, across three print editions and several other references, had to be developed in order to minimize as much as possible the likelihood of human error. The human eye may easily skip while focusing on just one text, as witnessed in the work of all the descendants of Adam Scriveyn to this very day, not excluding electronic editors. Normally a text is marked up with the eye trained directly at one screen or at most moving from the electronic transcription to a digital image or to a single standard print edition. The chances of eye skip producing an error while an editor scans seven or more reference sources, all in eccentric spelling, are clearly multiplied. For this reason, it seemed preferable to develop a simple symbolic code, suitable for use as an attribute value, that could be used to reflect basic collation decisions.

§ 31 To avoid changing the DTD, the global attribute rend—as yet unused by the PPEA—was pressed into service at this early stage. This was a deliberate act of attribute abuse, since rend is intended to encode how the element in question was rendered or presented in the source text (P4 33 Element Classes: global ; P5 Element Classes) and the Ht project was at best using it to describe how elements might be distinguished in the output. The alternatives, however, seemed likely to cause much more damage to the project within its wider PPEA context, and this use of rend could itself be carefully documented elsewhere so as to conform to TEI recommendations at least at the lowest level of conformance.

§ 32 The line-level identification itself was done in three passes—one against the A-text, one against the B-text, and one against the C-text—minimizing the need to shift between printed reference sources and recording only the absence or presence of a line in each version of Piers. A system of symbols was devised to account for the unary, binary, or ternary state of a line as that state gradually became apparent. Additional notation was devised to represent whether or not a given line could be upgraded to a unary status by later identification of discriminant variants, and also to represent clearly which of the three Piers traditions the line could possibly represent—in distinction to that tradition which it would be finally determined to represent. This value was then recorded in the n attribute of <l> . Thus, for line 62 of the Ht Prologue, the rend attribute after the initial approximate collations were completed contained the following symbols, where % represents the presence of a line in one tradition and [] contains a value of A, B, and/or C, indicating the tradition or traditions in which a parallel line initially was discovered:

<l id="HtP.62" rend="%%%[ABC]"> Meny of þes maistre freres may cloþe hem at lykynge </l>

§ 33 After this line-level identification, the potential parallels could be examined in the hope that discriminants would emerge—as indeed is the case in line 62, where Ht shares the reading maistre freres only with B-text manuscripts W Hm Cr1 CR2 Cr3 Y O C2 C L M H. As a result of this subsequent examination, the line also receives a +—a symbol used to identify the line for subsequent processing and addition to the pool of unary lines. To such an upgraded binary or ternary line and to all unary lines, an n attribute value could then be supplied with a high level of confidence; this value represents, in the case of unaries, the unambiguous assignment of the line to the A-, B- or C-text tradition:

<l id="HtP.62" n="KDP.62" rend="%%%+[ABC]"> Meny of þes maistre freres may cloþe hem at lykynge </l>

§ 34 All other lines, whether binary or ternary, would always have to be examined against the two or three traditions to which they might be related, and any determination of their line-level affiliations would have to be made based on the affiliation of the surrounding unary and upgraded lines. It would be only at this late stage that any sort of rigor could be applied to the assignment of certainty values.^[5]

§ 35 Occupying not more than the space of nine characters in their encoding, these values for rend could be kept precise and represent known levels of certainty. They are easily processable and renderable in order to enable unskilled or semi-skilled assistance in the later proofing of the matrix of parallel line number references. By using this categorical approach, the project side-stepped problems with the ad hoc determination of certainty by the encoder. Each step in the process represents a defined degree of certainty and project staff working with the text in the future have an easily understood and very clear representation of the collation process used and the reliability of its current state.

§ 36 In this case, abuse of rend seemed to represent both the least damaging and the most transparent way to encode crucial information about the relation of Ht to the A, B, and C traditions of Piers as that information built up gradually. It can of course be easily re-encoded as an entirely new and different attribute later. The particular situation involving heavy contamination from many manuscript traditions may be comparatively rare, at least among manuscripts of Piers, but it is not by any means unique to Ht, nor, surely, is the situation rare in which something ambiguous in a document needs to be transcribed and marked up as a means of eventually discovering its nature through subsequent processing—which may in turn result in a subsequent, more appropriate markup.

Experimental markup as part of a recommendation

§ 37 TEI and PPEA recommendations tend for very laudable reasons to address the most common instances in which the significance of a text string is already known or at least easily discernible before markup commences. Indeed, one of the chief aims of the TEI is to make available a standard of interchange for those aspects of texts which are of recognized importance to the widest possible community.

§ 38 The technologies applied by and applicable to TEI-encoded texts, however, show great promise for the gradual examination and elucidation of as yet unknown or imperfectly known phenomena in a manuscript. A globally available attribute for the encoding of speculative or experimental data of the kind described above would foster rigorous experimentation within a controlled context without deliberate tag and attribute abuse. An x, exp, or even temp attribute would perhaps seem to some to allow for undisciplined markup habits; in actual practice, however, such attributes would foreground the experimental or provisional nature of their values, thus in fact lending both rigor and transparency to a process that under certain circumstances is crucial to making a more determinate decision later—one that might be much more suitable for encoding under current recommendations.

§ 39 Provision of avowedly experimental attributes for existing elements would also support the production of editions of manuscripts containing non-standard features under the real conditions of large electronic projects in which there is usually a significant lag time between the discovery of an anomaly that needs to be marked up in a way not provided for in existing recommendations, the final decision by an editorial board as to how to handle the phenomenon in question, and the retrofitting or new development of ancillary software to support the change. Having available experimental place holders that are already provided for in the standard DTD would allow controlled experiments to be carried out without breaking all or most existing support mechanisms at a stage during which it is uncertain that major changes should be made to accommodate the experiment on a permanent basis, as a received standard of proven merit.

Experimental markup and codico-textual analysis

§ 40 As will be seen in the remainder of this discussion, a series not only of experimental attributes but also of experimental elements based on the models of the existing TEI elements, yet named in such a way as to make clear their experimental nature, would foster the combination of rigorous encoding and the free play of ideas needed for the exploration of text in relation to codex, and of large groups of encoded codices in relation to one another, enabling the computer-aided study of large samples of evidence of scribal practice in copying, correction, mise-en-page, ordinatio, glossing, and use of materials such as vellum and paper stocks.

§ 41 With the development of encoding standards for such codicological features in a way that relates them directly to the texts they contain, clearer insights than are now possible could be gained into what manuscripts may have come from the same workshops, and how practices differed among workshops and scriptoria, even trans-historically and trans-nationally.^[6]

Inverse recension using TEI <app>

§ 42 Although it was hoped that the Ht project could conform to all TEI P4 and PPEA recommendations from this point forth, it became abundantly clear at the very next stage—that of the search for discriminant variants—that a second extension would have to be undertaken, that of adapting TEI <app> to the encoding of witnesses to variants in a documentary edition in which the forms of the variants themselves would be less important than their simple agreement or disagreement with the readings of Ht.

§ 43 When each line in Ht had been shown either to be Ht-unique, or parallel to one, two, or three of the traditions of Piers, examination of the variants could commence. At first a rapid reading against the three Athlone editions was undertaken, chiefly as a means of disambiguating whether a binary or ternary line could be seen to have come from only one of the three traditions by discovery of discriminant variants such as maistre freres, a reading unique to the B-text, as in the earlier example from Ht Prologue, line 62.

§ 44 After this final preliminary stage, serious collation of all the variants and recording of the sigils of all witnesses bearing them could finally begin in order to discover if any relationship existed between Ht and the known genetic groups of surviving manuscripts.

§ 45 The process, however, was vexed from the start. The encoding of the lemmas of Ht set against the readings of surviving witnesses was not possible using a straightforward application of TEI and PPEA recommendations. In this case, the PPEA did not at the time use TEI <app> at all, since editions in the initial PPEA series are documentary, building together into a database of all the readings of all the Piers manuscripts, to be sure, but reserving the majority of such inter-textual analyses for the later stage in which archetypes would be reconstructed.

§ 46 Analysis of Ht's genetic relationships, leading to the discovery of the filiation of at least some of its exemplars, however, could not be at all efficiently done without the use of computing. Typically in the PPEA, a Piers manuscript is read against the recension from which it derives, with special attention given to its agreements with only one or two other manuscripts, or to somewhat larger, known genetic groups. These relationships, explicitly documented in the Athlone apparatus, are then recorded in notes for later collation and analysis in a narrative introduction.^[7]

§ 47 Since the relationship of Ht to the other manuscripts is so much more complex, and since the readings of Ht were not explicitly recorded in the Athlone editions because of its eccentricity (Kane and Donaldson 1988, 15), a means of markup had to be devised in the hope that some means of analytical display could be used to aid in the search for patterns of agreement in as inductive and machine-processed a way as possible—something that could not be done using the discursive textual notes of other PPEA editions, since these would have to be examined later by eye, one by one, in order to discover emerging patterns intuitively, or by constructing one chart or graph after another by hand.

§ 48 The TEI <app> element was thus pressed into service for a sort of inverse recension. That is, unlike traditional recension in which a manuscript is chosen as a copytext and others are collated against it to weigh the merits of their various readings and discover as much as possible the readings of a hypothetical earlier state of the text, the goal in this case would be to see how other genetic groups fed into the hypothetical text that had already been constructed by the Ht scribe—the Ht manuscript itself. The Ht scribe's construction of a gesamt Piers stands before us in the artifact itself. What lies hidden are the traces of the exemplars from which he constructed that text.

§ 49 Although the TEI <app> element was originally intended to be used as a means of representing the information contained in the standard apparatus criticus of a full scholarly edition collating and weighing the readings of many manuscripts, this element was admirably suited as a data model to use in this inverse manner, even though the <rdg> element in this case would never have content meant to be interpolated into Ht as a better reading than that already witnessed by Ht. The <rdg> element would always record only those manuscripts and their readings that disagreed with Ht, while <lem> in turn would contain in its wit attribute only the reading of Ht, together with the sigils of any manuscripts that agreed with it. The content of <rdg> certainly would form a most useful database for later characterization of the nature of the variants in Ht set against other, less eccentric manuscripts. Indeed, a contrastive analysis of his choices might shed light on the scribe himself, his likely profession, class, level of education, and so forth. At this second stage of the work, aimed chiefly at the determination of the scribe's exemplars, however, the content of <rdg> was relatively unimportant, as long as the wit values were always accurately supplied.

§ 50 Unique readings or even unique lines could in any case be foregrounded for a machine-generated database of readings that are arguably more likely to represent the scribe's own language because they would have a <lem> with a wit value of Ht only, and thus could easily be referenced by a stylesheet. Time constraints set against the very time-intensive nature of TEI <app> tagging and the very irregular nature of medieval spelling made it advisable to supply only a set of variables in place of the actual readings of disagreeing witnesses, as in the following example from Ht5.163, where there is very great divergence in readings among the surviving witnesses:

<app type="variables" loc="Ht5.163">
<lem> treso <expan> ur </expan> &~ tresou <expan> n </expan> be not </lem>
<rdg wit="Cr1 Cr2 Cr3 Y O C2 C Bm Bo Cot L M H" type="s"> {gamma} </rdg>
<rdg wit="F" type="s"> {delta} </rdg>
<rdg wit="R" type="s"> {epsilon} </rdg>
<rdg wit="Hm" type="s"> {zeta} </rdg>
<rdg wit="W" type="s"> {eta} </rdg> S<rdg wit="{sigma}" type="s"> {theta} </rdg>
</app>

§ 51 Here, with the decision having been made to delay supplying the full set of readings for each disagreeing witness, each distinct reading is represented instead by the name of a Greek letter inside curly braces within its own <rdg> element, following each time the order in which they appear in the Athlone apparatus, in the hope that perhaps one day an electronic version of the Athlone editions could be used to supply the very eccentrically spelled readings in their place.^[8]

§ 52 As at all junctures in this project, a willing suspension of curiosity about the readings of Ht set against the other surviving manuscripts had to be maintained in order to keep the project within any sort of bounds of time and labor. The <app> -tagged unary lines, even in the absence of the variant readings, have made possible the development of XSLT stylesheets that extract and display the unique word- and phrase-level readings of Ht in relation to the sigils of agreeing and disagreeing witnesses, as well as the whole lines that are unique to Ht (including two very intelligently made unique Latin interpolations). This use of TEI <app> has also enabled the development by Doug Chestnut of the University of Virginia's Alderman Library of JavaScript, CSS, and SVG-based tools for examining patterns of filiation—tools which, though still in an early stage of testing have strongly pointed to the presence in certain passages of sub-linear contamination from each of two genetic groups of the B-text tradition, FH (by contamination) and OC2, suggesting the presence of two B-text exemplars at some points in the construction of the Ht text:

Figure 1: The Watersheds Diagnostic script applies background color to readings in Ht according to the user's selection of conditions fulfilled by the values of the wit attributes of <lem> and <rdg> . The script can be used to examine levels of contamination from one or more different witnesses as well as across the A, B, and C traditions.

Non-textual features in Ht

§ 53 The need to establish, wherever possible, corroborative evidence for tracking the Ht scribe's shifts from one exemplar to another, and the need to demonstrate if possible that Ht is itself the original manuscript supplying its conflated text, motivated another type of markup that appears in neither the TEI or PPEA recommendations: the markup of phenomena that fall into a category of scribal paratext or scribal apparatus, such as corrector's crosses (whether erased or not, and whether or not they seem to have invoked an actual correction), other crosses used to indicate such activities as planned but delayed rubrication, marginal ticks, carets, guide words or guide letters, and marginal or inline scoring other than the plummet used for the ruling. Ideally all of these, in addition to all the ink stints and certain codicological features such as watermarks would be encoded so as to be available for display and analysis in relation to one another, and in relation to both the textual and codicological divisions and features of the manuscript in varying diagnostic views. Such views could, for example, reveal the typical length of the scribe's correction stints and, if set against the erased letter forms and the forms that replaced them, could also reveal their association with one or more exemplars used as proof text, or the association of a particular paper stock with such a shift in exemplars.

§ 54 Many of these features are quite naturally beyond the scope of the TEI, which is a Text Encoding Initiative, rather than a Codex Encoding Initiative or a Scribal Paratext Encoding Initiative. Nevertheless, evidence of scribal planning and method for correction, ordinatio, and mise-en-page, whether planned only or both planned and executed, relates directly to the scribe's use of his exemplars. As a result, the ambition to discover the number and nature of these exemplars required at the time some sort of true extension to TEI P4, or at least that tag abuse would again have to occur.

Non-textual elements in the TEI P4 and emerging P5 recommendations

§ 55 Existing TEI P4 elements that might have been pressed into service for recording the Ht scribe's paratextual marks—such as <ab> , <join> , <s> , and <seg> —are intended to mark up strings which themselves represent transcribed text or speech, making their use as paratextual markers an act of tag abuse at best. Both the TEI P4 and TEI P5 chapters on Transcription of Primary Sources (P4 18; P5 18) suggest more than one strategy for encoding non-textual features by direct association of such scribal paratextual markings as corrector's crosses and drypoint scoring with such elements as <add> or <del> (P4 18.1.4; P5 18.107.204) or <hi> for lines of ambiguous meaning under or near text which an editor wishes to encode indeterminately (P4 18.2.6; P5 18.108.213).

§ 56 These strategies would appear at first blush to be perfectly suited to the markup of the corrector's crosses and drypoint scores in Ht. In actual application and later processing, however, they proved problematic. For example, all of these strategies are associated with elements that mark text strings, so that even the greater ambiguity afforded by the suggested use of <hi> still associates the mark with a specific text string. This may not seem to be much of a problem until one considers that many lines marked for correction have more than one correction per mark, sometimes in different inks. In this case, with which correction would one associate the mark? Also, a very large number of lines have been marked without any correction appearing to have been made. These marks would not probably be of particular interest to an editor working within the bounds of traditional textual criticism, but they do represent a part of the pattern of proofing that the scribe undertook. The fact that he never acted on scores of marks that he had so carefully made and the distribution of these omissions is something that will be examined in Ht as part of the overall discussion of the scribe's program of proofing and his pattern of correction from varying exemplars.

§ 57 As a result, the markup used to encode such marks needs to be independent of any element directly and necessarily associated with a given text string. Likewise, the drypoint scores in the margins of Ht have no known paratextual function at all and may range in significance from the scribe's merely preparing a new pen prior to inking it to his marking of passages for checking against other exemplars. Ideally, these would also be encoded in such a way as to indicate their general location without associating them with an element containing text that they may or may not have anything to do with. Other paratextual features such as marginal ticks and tiny, lightly inked, indicators for notas, paraphs, and other marginalia also have an existence of their own that is both distinct from and yet related to the textual additions which they invoke.

§ 58 A program of versification to be highlighted by paraphs might be indicated by one hand at one set of points but carried out by another at another set of points, perhaps corrected against the paraph markings of another exemplar, or one set of indicators for marginal glosses may appear to have been enacted while another seems to have been ignored. Does one in such cases assume that the indicators were missed accidentally, supply the paraph or nota in an <add> tag, and then associate the indicator with it? In some cases this might well be warranted, but in others not. Is there an interesting distribution of carets used to indicate the point of insertion of a corrected reading, or of lightly inked marginal ticks as opposed to na abbreviations to indicate a nota? Were the notae ever rubricated? Such differences in scribal habit may coincide with a change in hand or exemplar that is difficult to determine without this added information.

§ 59 This state of affairs highlights the need to develop codicological and scribal paratextual elements and attributes that can function independently of the textual ones already developed by the TEI, yet be examined in direct relation to them if so desired. Such elements and their attributes could be associated with textual elements during later processing, but they would not have to be, and they would be much easier to study both in and out of the context of possibly but not necessarily related textual features.

§ 60 A few examples of non-textual phenomena do arise in the TEI recommendations, chiefly in relation to transcribed recorded speech (P4 11.2; P5 11.61), but undertaking an official extension of the TEI guidelines to include paratextual and metatextual phenomena of several more kinds would require a very large and concerted effort, probably by a separate Special Interest Group. Institutionally and in terms of research, this situation touches on directions in which the developing work of the TEI Taskforce on Manuscript Description (http://www.tei-c.org/Activities/MS/) might or might not go, as well as on areas of inquiry to which the TEI recommendations might or might not be applied in the future. The experimental invention of new metatextual elements and attributes for use in the Ht edition has foregrounded both the problems and the possibilities inherent in the attempt to accommodate such elements in a way that can be both fully integrated with text and abstracted from it with equal ease. The remainder of the discussion will outline the experimental method used in encoding Ht and its results, with a summary of the direction in which this work points in terms of new development.

Encoding non-textual features in a digital edition of Ht

§ 61 Since such traces of scribal activity as corrector's crosses mark places where the scribe was re-examining his copied text either against his original exemplar (seeking to expunge his own errors in copying) or against a different exemplar or exemplars (seeking to interpolate what he deemed to be better readings) or even against his own sense of embarrassment at disapproved dialectal forms that had crept into his text, inclusion of this data in regularized markup of the Ht text was deemed to be essential to the goals of the project, which emphasize the scribe, his methods, and materials.

§ 62 The choice between abuse of an existing tag on the one hand, and a large amount of invention on the other is not as straightforward as its seemingly binary nature would suggest. Exploiting elements defined in the PPEA's TEI-conformant DTD but not used by the PPEA at the time would have allowed the continued use of much of the software already developed by the PPEA such as the DTD and its associated parsers, with the simple suppression in standard stylesheets of the added elements as a minimal tweak to already developed browsers.

§ 63 Nevertheless, during the course of the work in question, the TEI <seg> element came into use by the Archive to mark up certain forms of punctuation, a move which highlighted the need not to regard any element as completely open for experimental use in the context of a large, on-going project, especially for marking up paratextual or codicological elements that the PPEA transcriptional protocols explicitly declare not to be recorded in relation to the text in Archive editions.

§ 64 Additionally, an abused tag, however well-documented, tends to conceal its abusive nature behind a familiar and recommended form. As a result, even though the TEI <app> element has been used in a necessarily novel way on the Ht project, it would seem that a better policy in the end would have been boldly to invent new elements, even if they are modeled as exact duplicates of existing elements in all but name. Such, at least, call immediate attention to themselves as something novel. This policy would apply most especially to the final state of a project—the state in which it is most likely to be used in conjunction with other, previously unassociated projects, since no mistake could be made about the nature of the experimentation.

§ 65 As a result and with more mature consideration of the possible consequences of the abuse of <app> , the decision was made to open up a whole new category of elements relating to scribal paratext, including <scribapp> (scribal apparatus) for paratextual phenomena contemporary with the original production of the manuscript, and <histapp> (historical apparatus) for such phenomena produced by hands in no way involved in the original construction of the manuscript, but whose sundry markings in it appear perhaps to point to salient features of the text that might prove useful to future analysis.

§ 66 The definition and application of the new <scribapp> element was complicated by the fact that some paratextual phenomena such as catchwords, guide letters, and quire and leaf signatures are already encoded by the PPEA using the TEI <fw> element. Preserving a neat distinction between manuscript phenomena meant to direct a scribe's work without being noticed by a reader on the one hand (guide words and guide letters, corrector's cruces, marginal drypoint scores, differing marks to indicate later addition of notas, paraphs, and other marginal elements), and those textual elements clearly intended to be seen and read on the other (running titles, headers, colophons, paraphs, and marginalia themselves, distinct from their indicators) was thus not a practical possibility for the Ht project, even though the dichotomy between scribal work as directed during production and scribal work as produced and intended to be seen would have been best in the ideal. For the purposes of this project, then, all phenomena already marked up with the TEI <fw> tag were marked up in that way in Ht, whereas any features under scrutiny that had no markup convention as yet were subsumed into the regime of the new <scribapp> element.

Figure 2: Two features marked up with the newly defined <scribapp> element, a marginal drypoint score (purple and yellow stripe), and an erased corrector's crux (white x with a black background) coincide with an A unary interpolation, revealing a prime location for exploring the possibility that the scribe was consulting more than one A exemplar, invoking the drypoint score.

§ 67 Definition and application of the new <histapp> element was far less complicated, since marginalia by later hands had never been taken into consideration in the PPEA documentary editions except in discursive notes. In Ht, the only feature thought to be both later than the original act of production and of any interest for the project were the pencil marks in the outer margins which correspond to many but not all of the shifts between the A, B, and C texts, placed there by someone trying to understand the nature of the Ht text, possibly by Thomas Dunham Whitaker who based his edition of Piers Plowman (Whitaker 1813) on manuscripts Ht, O, and P.^[9]

Lessons learned

Successes and spinoff

§ 68 The effort to invent and encode the markup for paratextual elements such as corrector's crosses and drypoint scores in Ht has been rewarded with a newly established ability to consider in separate categories those corrections undertaken as part of a programmatic proofing stint, as distinguished from those done on the fly, with this data also set against ink stints recorded using standard TEI <handShift> elements. Drypoint scores, somewhat more elusive in their relations to textual patterns, have also been seen to correspond loosely with shifts between the A, B, and C texts and may eventually be found to correspond with shifts between two or more distinct A, B or C exemplars—a discovery that would have been not entirely impossible, but nearly so, without the ability to generate graphical views of the patterns using CSS and XSL stylesheets, together with XSL-generated tables of corrections, variants, and the scribal markings associated with them.

Experimental, inductive markup: feasible and productive

§ 69 As eccentric as the electronic edition of manuscript Ht of Piers Plowman has proven to be, it has been a testbed for new methods of analysis and a productive ground for analytical tools and methods applicable to the other PPEA editions. Especially useful are those relating to the examination of filiation across the boundaries of the A, B, and C traditions such as the symbolic collation of all unary, binary, and ternary lines, which is now building towards a complete concordance of all A, B, and C line parallels, and the use of the TEI <app> element for the encoding and examination of unique readings and variants shared by a manuscript and genetic groups deemed to be of interest in a given documentary edition. The adaptive use of the <app> element, the symbolic values applied to the rend attribute of <l> to create a categorical form of cert for projects needing to assign certainty levels to thousands of textual loci, together with the completely new experimental elements <scribapp> and <histapp> for encoding phenomena ahead of knowing their full significance, have all provided for a deeper analysis of a manuscript whose textual affiliations were previously impenetrable except on an ad hoc basis, and whose scribal interventions are nearly impossible to track without the aid of computing.

§ 70 Speculation, observation, rigorous encoding and varying forms of display have all become part of the analytical process in the Ht documentary edition, and have also aided in the production of interfaces for both semi-skilled assistance and reference by other editors wishing to concord the three Athlone editions with their own manuscripts, despite the idiosyncratic condition of the Ht text and its forthcoming PPEA edition.

Appendix 1: Awaiting further technical developments: TEI P5, <msDescription> , and a call for a fully TEI-integrated CEI

§ 71 There remain two final, more remote goals of the Ht project that have not been met, and cannot be met by the work of any one editor, on any single project. The first is that of encoding relevant codicological data such as the location and form of watermarks, the number of wirelines over a constant interval, the matrix of chainspace measurements, the mould and felt sides in the paper folios, the hair and flesh sides in the vellum folios, the collation of each quire including missing and added folios, and the quire boundaries in direct relation to the text that is conveyed on the manuscript's codicological base, in a fully realized recommendation of its own. Such a recommendation need not and should not be limited to manuscripts nor even necessarily to codices, since the study of papyri and printed books could also be undertaken using these methods.

§ 72 Much of this work has indeed been done by the TEI Taskforce on Manuscript Description (http://www.tei-c.org/Activities/MS/), in a rich, compact, and flexible encoding standard highly suited to cataloguing and quantitative codicological study, but adaptable through the newly defined <locus> element to studies integrating text and codex. Such adaptations, nevertheless, would occur in the absence of explicit recommendations, and would depend on the use of <locus> to point to existing elements, nearly all of which are intended to mark up text strings as such, and none of which has been designed to indicate the presence of non-textual scribal markings or those made by other hands independent of a text string which may or may not turn out to be associated with it upon further examination.

§ 73 A second, more remote yet perhaps more important goal related to that of integrating text, scribal paratext and codex in a single, TEI-integrated recommendation has been inspired by the vagaries of the Ht project—that of fostering in a practical way the ethos of experimentation in manuscript markup which would be devoted to the full accommodation of manuscript features of an unknown nature, but an ethos that nevertheless fully recognizes itself as the child of the rigors of the TEI encoding recommendations, without which collaborative work cannot proceed. Only such a marriage of rigor with experiment will enable the full realization of the potential of XML and its related technologies for textual studies including not only standard textual criticism and quantitative codicology but also codico-textual work that examines codicological features in relation to the text.

§ 74 Such a goal is eminently realizable, but only through the cooperation of members of the TEI and scholars in the field willing to risk experiment and devote time and attention to recommendation. Both the TEI and Digital Medievalist communities have the talent, the experience, and the vision to make this possible.

Notes

[1]. For a listing of unique lines in Ht, see Russell and Nathan 1963, 126-128. Three of these lines contain quotations from the Vulgate (Ht10.474 [Mt 23.4b], Ht10.552-554 [Mt 23.4a], and Ht21.46-47 [Lk 9.58]) and one of them from the Disticha Catonis , I.38.2.

[2]. A recent review by Andrew Galloway of the PPEA editions of manuscripts F and W has placed a strong emphasis on the new possibilities offered by electronic editions and their search engines for computer-aided analysis of manuscript features such as corrector's crosses and drypoint scores that have hitherto been studied only unsystematically (Galloway 2004, 243-44).

[3]. With few exceptions, the riches of the Piers Plowman Electronic Archive and the University of Virginia's Institute for Advanced Technology in the Humanities were inaccessible to the Ht project because of its unusual character. Straightforward application of legacy scripts, suites of stylesheets and other software from earlier editions, while possible in most current and forthcoming projects, was impossible in the case of Ht, while these same eccentricities made investment of time and money from the larger Project in development to cover this special case unwise.

[4]. For the purpose of the present discussion, the additional possibility of a line or smaller reading being related to a genetic group or whole tradition no longer extant will be left aside.

[5]. Clearly, in the course of this inductive process, when the value of n could finally be assigned with some confidence, and still more when the full reference system had been constructed, the compact encoding in rend became redundant. On the way toward this goal, however, it proved invaluable for its compactness, for its usefulness as a guide to the status of lines in mid-process (when only the A- and B- texts had been examined), and as a way to delay committing to a more elaborate reference system (and perhaps additionally having to break existing display tools) until it was certain which system might be the most appropriate.

[6]. The provision in the P5 recommendation of the <msDescription> (http://www.tei-c.org/release/doc/tei-p5-doc/html/MS.html) tagset goes a very long way—indeed almost all the way—towards making the machine analysis of large groups of manuscripts possible. The strength of the new recommendation lies in its provision of a regularized encoding of manuscript features essential for quantitative codicology, including watermarks, materials, heraldry, catchwords, and signatures (P5 13.71) in a thorough, compact, and flexible format (See P5 13.69 Overview).

P5's <msDescription> (http://www.tei-c.org/release/doc/tei-p5-doc/html/MS.html) also makes possible the beginnings of a recommendation for the encoding of these codicological features in relation to a transcription of the text with the provision of the new <locus> (http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-locus.html) element (P5 13.71.164). This would allow any of the features noted above to be associated with a point in the text, as long as it were possible to reference that feature to some point in a transcription, most likely a folio <milestone> . Since <locus> allows for the specification of an array of discontinuous values in its targets attribute (P5 13.71.164) it could presumably be used to discover, say, that all folios copied by a given hand were done on paper of a given stock that is unique to a particular quire in the manuscript, since the quiring, foliation, and watermarking would all be accounted for and marked up to point to a set of folios. Matrices of features encoded using <locus> could presumably be used to examine patterns of mould and felt, hair and flesh sides, changing paper stocks and the like in relation to a single, complete transcription of a manuscript, though it is not clear in the documentation as of this writing, in early January 2006, whether or not the transcriptions of the specified range of folios are envisioned to be a complete edition or, more likely, discrete passages from the text used as examples only. It would also seem perhaps a possible though also perhaps a very complex matter to associate the manuscript features provided for by the <msDescription> tagset with features in the text such as corrections, corrector's marks, and other features more usefully identified with a particular line or point in a line rather than with a whole folio.

The remainder of the discussion on experimental markup of Ht based on conditions under the P4 recommendation will highlight ways in which the Manuscript Description recommendation might be enhanced to aid in the study of how codicological features are related in a given manuscript to the scribe's practices in copying, proofing, and use of exemplars, especially at this more fine-grained level of examination.

[7]. Experimental use of TEI <app> is underway in the forthcoming edition of manuscript Hm (Huntington Library Ms Hm 128, ed. Hoyt N. Duggan, Michael Calabrese, and Thorlac Turville-Petre), though in this case it is used to examine only the Hm unique readings and those genetic groups already known to be of interest. Nevertheless, using this method, the Hm lemmas and the sigils of the manuscripts to which they are related need be encoded only once for use in multiple places and for multiple purposes throughout the Hm edition, something that was not possible with the use of notes inside <note> elements.

[8]. The final <rdg> in this usage always represents the reading of all manuscripts whose sigils are not already recorded in the current <app> , a necessary measure to take into account the fact that some sigils and their associated readings are, unfortunately, implied rather than specified in the Athlone apparatus. When <app> tagging is complete, an application developed by Shayne Brandon at the University of Virginia Institute for Advanced Technology in the Humanities examines all sigils present in the current <app> and replaces the value {sigma}—a variable customarily used by PPEA editors to represent all other manuscripts—with any sigils that may remain unaccounted for. Sometimes, as in the example above, the reading represented by the final <rdg> value (here {theta}) happens to be Kane and Donaldson's conjectural emendation, since all other B-text sigils are already in fact accounted for. In this case, then, the value {sigma} would not be replaced with any sigils, and this final <rdg> could be either expunged or used for a later examination of Kane, Donaldson, and Russell's emendations, independent of the Ht project.

[9]. Manuscript O (Oriel College Oxford 79) is a B manuscript, P (Huntington Library Hm137) is a C manuscript, and Ht, as we have seen, can be almost anything at any point, so it is quite possible that Whitaker, whose enthusiasm for comparing the various versions dwindled as he made his way through the poem, may have tried by an early nineteenth-century means to model graphically the shifting textual features of Ht and found the method wanting (Brewer 1996, 42-45).

Acknowledgements

Acknowledgment for the design and building of the JR browser seen in the illustrations is owed to Jonathan Rodney, Technical Consultant and Software Developer for the Piers Plowman Electronic Archive. Specialized stylesheets have been added to the JR viewer for the Ht edition thanks to the advice and instruction of Doug Chestnut, Technical Web Manager, Alderman Library, University of Virginia, who is also the sole author of the Watershed and Genetic Groups Diagnostics tools. My sincere gratitude also goes to Guy Mengel, Director of IT Systems, University of Virginia Libraries, who has always had a minute or two to answer the sometimes vague technical questions of a rank amateur to whom he has never had any official obligation.

All three of these developers have consistently shown the patience and genuine interest that comes only from a love of learning for its own sake, for which they receive my highest praise and thanks.

Works cited

Adams, Robert, 2000. Evidence for the stemma of the Piers Plowman B manuscripts. Studies in Bibliography 53: 173-94.

───, 2002. The R/F mss of Piers Plowman and the pattern of alpha/beta complementary omissions: Implications for critical editing. Text 14: 109-137.

Alford, John, 1992. Piers Plowman: A guide to the quotations. Medieval and Renaissance Texts and Studies 77. Binghamton, NY: Center for Medieval and Early Renaissance Studies, State University of New York.

Boas, Marcus and Hendrik Johan Botschuyver, eds., 1952. Disticha Catonis. Amsterdam: North-Holland.

Brewer, Charlotte, 1996. Editing Piers Plowman: The evolution of the text. Cambridge Studies in Medieval Literature 28. Cambridge: Cambridge University Press.

Cummings, James. 2006. Liturgy, drama, and the archive: Three conversions from legacy formats to TEI XML. Digital Medievalist. 2.1.

Duggan, Hoyt N., 1993. A new critical-diplomatic edition of Piers Plowman B in hypertext. Æstel 1: 55-75.

───, 1994. Piers Plowman Electronic Archive research prospectus: Archive goals. Charlottesville, Virginia: Institute for Advanced Technology in the Humanities, University of Virginia. http://jefferson.village.virginia.edu/seenet/piers/archivegoals.htm

Galloway, Andrew, 2004. Reading Piers Plowman in the fifteenth and twenty-first centuries: Notes on manuscripts F and W in the Piers Plowman Electronic Archive. Journal of English and Germanic Philology 103: 232-252.

Hanna, Ralph III, 1996. Pursuing history: Middle English manuscripts and their texts. Stanford: Stanford University Press.

───, 1989. The scribe of Huntington HM 114. Studies in Bibliography 42: 120-133.

Kane, George, ed., 1988. Piers Plowman: The A version, Will's visions of Piers Plowman and Do-Well: An edition in the form of Trinity College Cambridge MS R.3.14, corrected from other manuscripts, with variant readings. By William Langland. Rev. ed. London: Athlone Press.

Kane, George, and E. Talbot Donaldson, eds., 1988. Piers Plowman: The B version, Will's visions of Piers Plowman, Do-Well, Do-Better and Do-Best: An edition in the form of Trinity College Cambridge MS B.15.17, corrected and restored from the known evidence, with variant readings. By William Langland. Rev. ed. London: Athlone Press.

Russell, George H. and George Kane, eds., 1997. Piers Plowman: The C version, Will's visions of Piers Plowman, Do-Well, Do-Better and Do-Best: An edition in the form of Huntington Library MS 143, corrected and restored from the known evidence, with variant readings. By William Langland. Rev. ed. London: Athlone Press.

───, and Venetia Nathan, 1963. A Piers Plowman manuscript in the Huntington Library. The Huntington Library Quarterly 26: 119-130.

Scase, Wendy, 1987. Two Piers Plowman C-text interpolations: Evidence for a second textual tradition. Notes and Queries 232: 456-463.

Seymour, M. C., 1974. The scribe of Huntington Library MS. HM 114. Medium Aevum 43: 139-143.

Skeat, Walter W., ed., 1886. The vision of William concerning Piers the Plowman in three parallel texts, together with Richard the Redeless. By William Langland. Oxford: Clarendon Press.

───, ed., 1873. The Visions of William concerning Piers the Plowman, Dowel, Dobet, and Dobest: The Whitaker text; Or text C. By William Langland. London: N. Trübner.

Sperberg-McQueen, C. M. and Lou Burnard, eds., 2002. TEI P4: Guidelines for electronic text encoding and interchange. 2 vols. Oxford: TEI Consortium and the Humanities Computing Unit, University of Oxford. http://www.tei-c.org/Guidelines2/index.xml.ID=P4

───, eds., revised and re-edited, January 2005. TEI P5: Guidelines for electronic text encoding and interchange. Oxford: TEI Consortium and the Humanities Computing Unit, University of Oxford. http://www.tei-c.org/release/doc/tei-p5-doc/html/

Wittig, Joseph S., 2001. Piers Plowman: Concordance, Will's visions of Piers Plowman, Do-Well, Do-Better and Do-Best, a lemmatized analysis of the English vocabulary of the A, B and C versions as presented in the Athlone editions, with supplementary concordances of the Latin and French macaronics. London: Athlone Press.

Whitaker, Thomas Dunham, ed., 1813. Visio Willi de Petro Plouhman, item visiones ejusdem de Dowel, Dobet, et Dobest, or the vision of William concerning Piers Plouhman, and the visions of the same concerning the origin, progress and perfection of the Christian life. By William Langland. London: J. Murray.