New Textual Traditions from Community Transcription

Frederick W Gibbs

doi:10.16995/dm.39

Introduction

§ 1 Finding manuscripts relevant to a particular research project, as well as understanding how such texts changed over time, remains a daunting task for many medievalists. This challenge persists even in the face of increasing digitization efforts because many manuscripts (at least of medieval and early modern texts) remain only viewable as images rather than accessible as full text. Regardless of kind, digital versions remain largely isolated from each other in libraries and individual project silos. Needless to say, the required time, effort, and expense of producing full text resources seriously hinders their production. Such limitations carry several unfortunate consequences: firstly, minor texts or texts without obvious utility get short-shrift; secondly, a tremendous amount of parsing, evaluating, and transcribing of understudied manuscripts gets left behind; and thirdly, invisibility and disconnectedness between manuscripts constrains our ability to build new research corpora.

§ 2 New technologies and methodologies in the digital humanities can help meet some of these challenges. I do not mean that they might make traditional editing or transcription practices more efficient. Rather, I would like to argue for a greater emphasis on creating digital noncritical editions that can capture traditionally lost transcription work, harness community expertise, and create a new kind of textual archive. I offer here a theoretical justification for community transcription practices and the textual archive it will produce. In what follows, I describe some key benefits of embracing a web-based transcription tool that would provide a number of advantages over conventional textual practices. In particular, I argue how such a tool can encourage large-scale collaborative transcription and editing in order to make more manuscripts much more visible, accessible, connectable, correctable, and usable.

Embracing the non-critical edition

§ 3 The venerated critical edition has served as the primary vehicle for delivering medieval and early modern manuscripts to scholars who need them. Yet two particular criticisms of traditional editorial practices, especially as clearly formulated by Jerome McGann 1983, have echoed throughout the last several decades:

textual editors have challenged the value and representativeness of canonical editions derived from a single editorial voice that claims to reflect authorial intent;
typical editing conventions compress historical and linguistic data to create an anachronistic version of the text that many users of the text never had access to.

§ 4 Both practices obscure the textual transformations between editions (and influences of related works) that can teach us about the production, evolution, and transmission of the texts themselves. Rather than compress data and redact textual variations to get the true text, web standards and technology now allow us easily to embrace multiple editions and to learn about textual evolution and transmission, especially if aided by analytical tools that can enable study of manuscripts on previously impossible scales.

§ 5 As availability and access to manuscripts has grown, so too has the desire to improve the granularity of our knowledge about which texts were available at a certain place at a certain time. One brief example from my own field of study, medieval medicine, can illustrate the point. An important but somewhat enigmatic twelfth-century text on women's medicine known as the Trotula comprises three distinct manuscripts on different topics, of which there are fifteen contemporary variations. These variants are a patchwork of local Italian and much earlier Arabic medical knowledge, and we know that later readers of the Trotula used a rather different text than what we might establish as the earliest edition (Green 2001). Yet the partial versions of the various proto-texts and the obviously later variants make them rather unsuited for publication, and therefore very difficult to compare with other relevant manuscripts. As a result, we lose the ability to understand fully how this text was shaped over time by a confluence of various medical texts, traditions, and cultures. To make any headway in understanding the diffusion of knowledge and its influences, we need easier ways of handling such texts — ways that can be used not only by textual editors, but also by any potential user of those texts, regardless of specific field of study.

§ 6 Although the shortcomings of the critical edition have been discussed for some time, little could be done to respond to them in practice. The limitations and conventions of traditional publishing and scholarly practice, for example, have made it virtually impossible to print variant editions of manuscripts, or to edit them in large collaborative teams. So how do we make more texts visible and available, whether for their individual value or value as part of a larger research corpus? How can we embrace the unstable and mobile text (Brockbank 1991) to help answer questions about textual change on both local and especially much broader scales?

§ 7 Edward Vanhoutte has suggested that the reason for neglect of noncritical editing in theory and practice is a "lack of a satisfactory ontology of the text on which a methodology of noncritical editing can be modeled" (Vanhoutte 2006). In my view, this is not a sufficiently different problem than persists with critical editing itself: an editor or transcriber must always make decisions about what constitutes the text. Instead, I would suggest a much less sophisticated answer: that there has not been any practical way to create noncritical editions that would not be prohibitively idiosyncratic and that could be used by the community at large to put more manuscripts in conversation with each other. This of course is one of the principal reasons for noncritical editing in the first place.

§ 8 The creation of noncritical texts certainly raises new issues of authority and quality. Bob Rosenberg tells us that "the most important point to be made about any digital documentary edition is that the editors' fundamental intellectual work is unchanged" (Rosenberg 2006). Arguably, creating metadata and mark-up does in fact create new intellectual challenges and choices for the editor. Regardless, I here plead for a new kind of documentary electronic edition where the fundamental intellectual goals and practices are in fact rather different: I am suggesting a shift in values from privileging the critical edition to prioritizing the creation of visible manuscript text. However, I do not argue against the critical edition so much as suggest some textual practices that can co-exist with it and provide complementary functions.

§ 9 Before outlining some advantages for community transcription of noncritical editions, I should lay bare a few presumptions I have about the gap between theory and practice with respect to the future of textual analysis. First, I wholly indulge in the fantasy that centralized databases and other repositories of metadata can be obviated through the widespread application of well-standardized semantic web technologies. But this is, of course, like Tantalus's next meal, continually out of reach. One reason for this is that the technical difficulties and heavy labor requirements create a serious bottleneck for producing appropriately encoded and marked-up texts. Another reason is that there is hardly any agreement about which of several viable standards will be most usable in the long term. In both the short and long term, we need a more scalable solution than individual mark-up projects that, despite laudable goals, tend to rediscover the difficulties of text encoding. Secondly, despite promising recent advances, usable OCR (optical character recognition) remains significantly far off with respect to both manuscript and even early printed texts. The variety of characters, hands, and layouts will make manuscript OCR a significant challenge for quite some time. Even once we have more reliable OCR technology, it would be nice to have an infrastructure to allow the manuscripts to be viewed together and improved by user expertise.

§ 10 I want to emphasize that my interest here lies in promoting the methodological practice of archiving quick and dirty transcriptions, rather than solving all of the technical and design challenges that such a transcription tool presents — though I would argue that they are best solved in practice, anyway. By utilizing an open web platform that can uniformly implement mark-up standards and avoid impossible-to-maintain hardware and software requirements, the availability of new texts will be a boon to scholars across all disciplines. By no means an exhaustive list, I present some advantages of new approaches such a methodology.

Preserving and presenting lost work(s)

§ 11 Even though relatively few scholars work predominately as textual editors, many others often engage in localized textual editing efforts as part of larger research projects. In this way, we are all part of a decentralized team working toward the same (indirect) goal of making manuscripts more usable. A quick thought experiment: imagine if all the rough transcription that scholars have done over the centuries — work that has been reduced to a few quotations in footnotes — had been more fully preserved and was easily accessible. How might our texts, and especially our interpretations based on them, differ? Recent ease of publication and distribution makes it almost trivial to create such an archive from now on. Towards this end, I suggest that the scholarly community should think less about editions and more about versions of texts, configurable texts, and working in textual communities that will help scholars leverage community experience and expertise. Thus, a web-based transcription tool gives a practical embodiment to Reiman's aging but still insightful suggestion to emphasize "versioning" over "editing" (Reiman 1987). It encourages us to shift our emphasis from idiosyncratic final texts to the processes and practices in revealing and connecting texts as a collaborative effort.

§ 12 I contend that embracing the notion of textual communities will dramatically increase the visibility and usability of manuscripts as a whole, as well as the possibilities for interdisciplinary work. While partial transcriptions are, of course, unsuitable for traditional publications, availability of texts no longer needs to be bottlenecked by antiquated academic convention. Even with incomplete or imperfect transcriptions, the resulting increase in visibility will make the rich manuscript tradition accessible to high-level searches that scholars have come to rely on. Furthermore, researchers will be able to create ad hoc research corpora that could be used, for example, with text-mining tools to help analyze similarities, differences, and more easily visualize and recognize changes over time and trace movement of textual knowledge over time and space.

§ 13 One of the principal criticisms of community transcription has been called the Babel objection: the idea that inferior contributions will create so much extra noise to filter out that we won't end up with anything useful at all. Won't we be creating essentially a black hole of data with no hope of separating the wheat from the chaff? What do we do with all the junk? Perhaps we need, to borrow a phrase from Bill Turkel, a methodology for the infinite archive, like better storing and searching protocols. But the reality is that transcriptions of medieval and early modern texts will always be far from infinite or even overabundant. Ultimately, the best solution is simply to recognize the fallacy of the objection. Is it in fact reasonable to expect that anyone with training and interest in medieval manuscripts is going to contribute (and be allowed to continue contributing) anything so egregiously bad that the work would be wholly detrimental to an ongoing and self-correcting community effort? Hardly. If an unprecedentedly large archive creates a problem, it is a challenge to be warmly embraced, not avoided.

§ 14 I have used the term "community transcription," but surely many readers will recognize this approach as crowd-sourcing. But there is an important distinction that must not be overlooked when thinking about how to build scholarly research corpora. To think of Wikipedia (as many do, in my experience) and its highly variable article quality as representative of what will happen with community transcription is to make a category mistake. While just about anyone might feel like they can contribute to Wikipedia (indeed, that is the point), users of an online transcription tool for medieval manuscripts are a far more self-selecting group. While anyone could view texts, user registration would be required to edit texts. To assume that all work must be vetted by a firm editorial voice is to ignore the vast potential of highly trained and motivated community practitioners who want to work together to discover relevant texts.

§ 15 While quality of data remains a valid concern, I side with Anselm in that something that exists in reality is better than something that exists in the mind. Practically speaking, even when transcription quality is in doubt, it will be relatively easy for a researcher to determine if a manuscript warrants further study for a particular research project. Having an unrepresentative variant of a text is far better than having no knowledge of a text's existence. That is, we ought to prioritize visibility over accuracy. Another similar concern is that scholars will be confronted with too many adequately transcribed but simply unnecessary variants. But I propose that this tool encourages just the opposite. With a light editorial hand and proper interface, the greater visibility will bring more texts into the field of view and encourage engagement with them. Gradual emergence of standard or typical readings will come from community consensus and practice. Variations will be quickly viewable, but will not stand in the way.

§ 16 It must be emphasized that the goal of collaborative online editing is not perfect diplomatic transcriptions or mark-up. Nor do I suggest that crowd-sourced transcriptions serve the same function as, or could replace, the time-honored critical edition. But even for cases of texts that have been heavily edited over time, the tool provides easy ways of viewing change and particular editions. More importantly, a transcription tool focused on community contributions over time, even if partial and imperfect, can free scholars from the constraints of the critical edition, and let people see texts that they have determined are relevant to their research. At a bare minimum, pooling our transcription efforts will vastly improve the granularity of textual information. Having a large corpus, even of partial texts, obviously opens new doors for research.

§ 17 The quality of transcriptions from the community at large, at least in the short term, is perhaps not as useful for philologists or linguists, who often require the most precise possible transcriptions, as well as transparency in the interpretive work done between the manuscript itself and its transcription. The general editing principles behind the tool, and the slightly more uncertain editorial authority, make precise textual work problematic. But this tool could be used by philologists to make the precise transcriptions they need — it is just not designed primarily for that kind of specialized use. Overall, however, a number of existing projects have demonstrated the feasibility of and enthusiastic response to collaborative transcription and their goal of reaching wide audiences, such as the Australian Newspapers Digitization Program, Transcribe Bentham, and Papers of the War Department, to name just a few.

Help with transcription and encoding challenges

§ 18 Even if the benefits to the community are clear, why would individuals bother to use an online tool for transcribing medieval manuscripts? At an entirely functional level, such a collaborative approach can help with transcription challenges — like making sense of unusual abbreviations, unfamiliar words, or obscure references — by drawing on the collective intelligence and experience of the community. Users could, of course, silently expand abbreviations during rough transcription (as they often do). But they could also quickly represent them with regular keyboard characters (faster than finding Unicode values), creating over time a dictionary of abbreviations that can be used to provide suggestions when transcribing.

§ 19 It should be emphasized that users are not obliged to use this functionality; a transcriber need not enter abbreviations at all. Obviously, preserving arbitrary scribal characters is a huge task in itself and adds considerable time to the task. But again, because the primary goal is visibility, a fully diplomatic or complete transcription is not as crucial, especially since no single standard for capturing the many variations has ever emerged (Vander Meulen and Tanselle 1999).

§ 20 With respect to preserving the visual and linguistic artifacts of a manuscript itself, semantic web technologies and descriptive mark-up schemas like TEI hold great promise not only for their ability to preserve document structure, but also for the way in which they can help scholars find texts relevant to their research that would otherwise remain unknown to them. But the learning curve is steep, and text encoding projects remain slow and expensive.

§ 21 To improve matters, a community transcription tool will reduce significantly the barrier to entry and encourage mark-up of texts. To be sure, this is a complex user interface challenge. But this is not the place to hash out design solutions, but rather to re-orient our thinking about how and why mark-up can and should be carried out incrementally by individuals over time in order to realize the potential of text encoding and further improve visibility and connectivity of manuscripts. Of course, users would not be required to mark-up texts that they transcribe, but a highly polished interface for transcription offers the perfect platform to enable basic TEI mark-up of broad structures. Admittedly, marking up a complex revision process will continue to require dedicated editors. But as with the transcriptions of the texts, mark-up completeness is not essential. It is simply not necessary either to do it right or not at all, providing that we can expect and embrace incremental advancements from the community.

Between authority and autonomy

§ 22 Any effort to bridge theory and practice of electronic noncritical text editing must address (at least) two primary needs. First, to provide a way of maintaining a historical record of a text that has been edited by the community: who has done what, and when? Second, to mediate between authority and autonomy — that is, to allow researchers to contribute changes that they think are valuable, even to the same text at the same time — while retaining the ability for individual users to decide what they want and don't want to use, or even see.

§ 23 To address both issues, I suggest that we borrow from the principles and practices of open-source software development — namely the use of Distributed Version Control (DVC). Such a system maintains versions of texts that are publicly available, and yet also allows users to create private transcriptions that can then, or not, be returned to the community. Distributed version control improves on the premise of centralized version control in which everyone must take one version of a text as the master copy, and thus remains limited by centralized and top-down editorial authority. DVC is much more flexible in that regard. Even though the tool would provide a central repository for transcripted texts, it does not require that everyone must work with the same version of the document at the same time. People can work on different parts independently, sharing or not sharing work as they go. In this way, the advantage of distributed versions over centralized versions is that on the whole they mediate between authority and autonomy. Despite some extra overhead and logistical challenges, decentralization retains crucial freedom for individual editors. At the same time, version control maintains authority — researchers can know who has done what with the transcriptions. DVC also enables citation of changing texts. Because it maintains a full history of edits, it is possible to view (and cite) the text as it was at any given time.

§ 24 Leaving aside the technical details, the workflow might go something like this. First, anyone interested in working on a particular text will get or create a version of it. When done with a discrete set of edits (smaller ones are easier to manage), they upload them to the repository. Any conflicts with others who have edited the same part of the same document in the meantime are reconciled (this happens more in software development than it will in manuscript transcription, I imagine). They might then be approved by one of many editors who make sure it is reasonable but impose no other editorial control. Then it goes into the hands of the community, where it might be reviewed, reused, or lay dormant. When contributors upload transcriptions to the repository, DVC software can automatically merge changes that are independent of each other, but direct conflicts must be resolved manually. Of course, conflicts do not need to be resolved at all. Though not as practical with code as with manuscripts, it will often be valuable to maintain multiple (possible) versions of a text. TEI gives us the ability to obviate conflicts by simply embedding the variants within a single edited version of the text. This model has been used successfully, if somewhat opaquely, by papyri.info, and should be extended to more complicated textual traditions as well.

§ 25 Additionally, as the humanities rethink how to recognize digital and non-traditional scholarship, DVC can, as mentioned, track edits by particular users and thus provide a mechanism to recognize (even partial) transcription work as a serious contribution to the scholarly community. The criticism that quantification will encourage non-substantive contributions to inflate the apparent value of one's effort is unoriginal (and happens even now), and is easily mitigated through interface design and community convention.

Texts in contexts

§ 26 Participating in a community transcription effort to create a textual database makes it easier to situate one's own texts in the context of other texts and discover relationships that would otherwise remain invisible. Perhaps we might benefit from a single, authoritative archive for manuscripts, but that's not what I'm arguing for here. Rather, I'm suggesting that a tool agnostic to both library and project affiliations can complement existing cataloging projects, like Manuscriptorium and the ENRICH project, to create a powerful new collaborative environment that unifies public archives and the private workspace. Indeed, the recent API Workshop held at the Maryland Institute for Technology in the Humanities spawned an impromptu session that quickly agreed upon the need for creating a generic transcription tool that could be used by various transcription projects to help connect their textual resources. While the enthusiasm was directed primarily at the value of community transcription, I want to emphasize the value not only of the transcription functionality, but also of the much larger archive of texts it could create — a feature that does not seem to be a high priority for most transcription projects.

§ 27 On a broad scale, even small contributions of rough transcriptions will, over time, vastly improve our documentary knowledge as a whole by aggregating individual research projects. One advantage, in the case of the aforementioned Trotula, for example, is that it makes it considerably easier to trace movements of and influences on textual traditions. As Mats Dahlström reminds us, "digital scholarly editing offers the chance to organize paratexts and transmitted material in much more dynamic and complex manners than is possible within the printed edition" (Dahlström 2009). In terms of improving our textual granularity, visualizing and editing relationships between paratext and maintext can be a new way of establishing the state of a text at any given time (Monella 2008). As a platform for text creation and consultation, existing tools like TEI comparator and TEIViewer can be used to take full advantage of the text repository, at least of encoded texts (Schlitz and Bodine 2009).

§ 28 Such features are heavily dependent on an unobtrusive, functional, and intuitive interface. Texts must be easily (re)configured, and variations between versions must be easily displayed or hidden. As mentioned earlier, I want to emphasize that this is a design/interface problem, not a problem with the idea of collecting as much as text as possible. All data is good, as long as it can be managed. Fortunately, web interface technologies have advanced to the point where this is no longer the Sisyphean task it once was.

§ 29 It is perhaps worth mentioning that a useful transcription tool would not, and perhaps should not, need to function as an image presentation platform as do promising tools like T-PEN and Scripto, at least at the outset. The transcriber might be sitting in front of the actual document, a photocopy, a microfilm machine, a PDF, or an image from elsewhere, like from a library website or even Google Books. The effort to embed text in an image (transcription as annotation) can certainly bring exciting research possibilities (Lecolinet, Robert, and Role 2002), especially with efforts toward standardization like that of the recent work of the Open Annotation Collaboration, and projects like TILE. The idea is certainly well worth pursuing, but issues with ownership, copyright, and similar contingencies hinder practical implementation and seriously restrict the kinds of texts that could be transcribed; it might be best left for later development and not a prerequisite for a community transcription tool. An early focus on images may also discriminate against using the tool for capturing casual transcription regardless of the text's medium.

Conclusion

§ 30 I have argued that embracing the notion of community-driven, noncritical transcriptions will make dramatic progress toward discovering new textual traditions. By providing incentives for both individual and community participation, a web transcription tool will help reveal relevant texts, encourage cross-disciplinary work, and illuminate the development of ideas and texts over time.

§ 31 To return to the critical edition for a moment, the adoption of a community transcription tool frees scholars from the biases of the single editorial voice. Similarly, it allows freedom from authorial intention as the central editorial principle, in favor of versioned texts that could be used individually or in aggregate. A greater focus on preserving quick and dirty transcription will provide a valuable complement to canonical editions and make available more versions of manuscripts that actually existed, as well as texts that would never get a critical edition in the first place.

§ 32 There is little doubt that any success of web-based collaborative transcription will depend on embracing new technologies, practices, and interfaces. Certainly, realizing any of the theoretical benefits will require new workflows and overcoming complex user-interface design challenges. Exactly how the interface(s) should look and work requires an article in its own right, and best practices and consensus will likely emerge only after significant community engagement with some experimental prototypes. But what I am advocating here is not fundamentally about technology or design, but embracing transparency and openness in the ways in which we make texts available. This means shifting values toward creating and maintaining an archive of imperfect, but usable and visible texts. In this way, perhaps the greatest value of the tool and resultant textual archive will be more in the distant than the immediate future. But as the requisite technologies become stable and familiar, we must continue to advance the critical discussion of the possibilities and perils of new editing methodologies and principles now available to us.

Works cited

Bernard, Lou, Katherine O'Brien O'Keeffe, and John Unsworth, eds. 2006. Electronic textual editing. New York: Modern Language Association of America.

Brockbank, P. 1991. Towards a mobile text, in Small and Walsh 1991. 90-106.

Dahlström, Mats. 2009. The compleat edition, in Deegan and Sutherland 2009. 27-44.

Deegan, Marilyn, and Kathryn Sutherland, eds. 2009. Text editing, print and the digital world. Farnham: Ashgate.

Green, Monica. 2001. The Trotula: An English translation of the medieval compendium of women's medicine. Philadelphia: University of Pennsylvania Press.

Lecolinet, Eric, Laurent Robert, François Role. 2002. Text-image coupling for editing literary sources. Computers and the Humanities 36.1: 49-73.

McGann, Jerome. 1983. A critique of modern textual criticism. Chicago: University of Chicago Press.

Monella, Paolo. 2008. Towards a digital model to edit the different paratextuality levels within a textual tradition. Digital Medievalist, 4: http://www.digitalmedievalist.org/journal/4/monella/.

Reiman, Donald H. 1987. Romantic texts and contexts. Columbia: University of Missouri Press.

Rosenberg, Bob. 2006. Documentary editing, in Bernard, O'Brien O'Keeffe, and Unsworth 2006.

Schlitz, S. A, and G. S Bodine. 2009. The TEIViewer: Facilitating the transition from XML to web display. Literary and Linguistic Computing 24.3: 339.

Small, Ian, and Marcus Walsh, eds. 1991. The theory and practice of text-editing: Essays in honour of James T. Boulton. Cambridge: Cambridge University Press.

Vander Meulen, David L. and G. Thomas Tanselle. 1999. A system of manuscript transcription. Studies in Bibliography 52: 201-12.

Vanhoutte, Edward, 2006. Prose fiction and modern manuscripts: Limitations and possibilities of text-encoding for electronic editions, in Bernard, O'Brien O'Keeffe, and Unsworth 2006.

Abstract

Keywords

How to Cite

Downloads

1022

302

Introduction

Embracing the non-critical edition

Preserving and presenting lost work(s)

Help with transcription and encoding challenges

Between authority and autonomy

Texts in contexts

Conclusion

Works cited

Share

Authors

Downloads

Issues

Publication details

Licence

Identifiers

Peer Review

File Checksums (MD5)

Table of Contents

Abstract

Keywords

How to Cite

Downloads

1022

302

Introduction

Embracing the non-critical edition

Preserving and presenting lost work(s)

Help with transcription and encoding challenges

Between authority and autonomy

Texts in contexts

Conclusion

Works cited

Share

Authors

Downloads

Issues

Publication details

Licence

Identifiers

Peer Review

File Checksums (MD5)

Table of Contents

Non Specialist Summary