Introduction: main scientific and editorial issues
The archive of the Abbey of Saint-Denis
§ 1 The Abbey of Saint-Denis, to the north of Paris, was for more than a thousand years a wealthy landowner whose heritage not only extended in Île-de-France but stretched away to Berry and Picardie, to England and Alsace. It was also an important place to the royal dynasties of France and a very active and eventually official scriptorium for the history of the kingdom.
§ 2 The abbey created a notably rich and exceptionally well-preserved archival fonds, whose best jewels are a unique collection of Early Middle Ages charters (including all the original documents on papyrus kept in France today and all known originals from the Merovingian sovereigns), a considerable body of charters from later centuries (from the eleventh century to the fifteenth), and a bulky series of administrative documents, including a rich series of accounts beginning in the 1220s (and becoming denser from the 1280s).
§ 3 However, the series of charters was subjected to serious dismemberment. The first happened when King Louis XIV allocated the possessions alloted to the office of the abbot, which had become vacant, to the Maison Saint Louis de Saint Cyr: the documents associated with these possession were transferred there (today, they are at the Archives départementales des Yvelines). During the Revolution the abbey, its buildings, properties, and archival fonds were all nationalized. As happened with other religious communities in Paris and its suburbs, its documents were given to the new French National Archives. Some misappropriations also occurred at this time, so that, for instance, some charters can now be found at the Archives départementales of Paris and at the Archives municipales of Saint-Denis. The worst degredation came at the hands of the archivists themselves, who, until the end of the nineteenth century, were willing to reclassify and dismember the archival fonds of Saint-Denis, removing documents from it (and integrating them into the N, Q, L, or K collections) for the information they brought to the knowledge of the topography, ecclesiastical history, and history of the kingdom—grouping the rest into the S series, more or less following previous classifications.
§ 4 For two centuries, historians working on Paris have been confronted with this dismemberment of the collection, which slows down the reconstitution process of the original series of charters. The work may be even slower if the editor takes into account the copies of original documents lost today; thousands of them are presumably to be found in the French National Library, hidden among scholars' papers from the seventeenth and eighteenth centuries. Added to the perverse effects of an early centralization which saw the history of Paris subordinated to the history of the kingdom, this difficulty has hindered considerably efforts at documentary editing. Consequently, only few efforts have been made, limited to one type of document, such as the cartularies of the cathedral chapter of Paris, without any search of originals, edited in Guérard 1850, or to rather modest archival fonds, such as the charters of Saint-Magloire (Terroine, Fossier and Montenon, 1966-1998).
The archive and the École nationale des chartes
§ 5 In 2004, the École nationale des chartes (ENC) launched into setting up digital corpora of all sorts of documents (catalogues of engravings, lexicographic tools, etc.). Concerning medieval archives, it was decided to concentrate its efforts on the fonds of charters of the Paris region. Two programmes were launched: a digitization of the documents already edited (Cartulaires franciliens, i.e. “Digitized cartularies of Île-de-France,” see http://elec.enc.sorbonne.fr/cartulaires/) and an edition of the charters of Saint-Denis.
§ 6 The Abbey of Saint-Denis offered a virgin ground: there had been no edition of the medieval archives to date, except for royal and pontifical legal documents and a few documents coming from Abbot Suger. It also presented obvious richness, including an exceptional series of medieval and modern inventories of archives. Moreover, the Abbey of Saint-Denis had a rich historiography and had just been the object of archaeological research which offered many keys to the understanding of the documents.
Editorial ground rules
§ 7 The project, which we began in 2006, is based on new editorial rules, rather far from the usual method. The usual approach, to summarise, consists in looking for and editing those texts for which an original charter has been kept, then adding to this work the edition of those documents which are only known through one or several copies.
§ 8 Because of the great quantity of documents and their dissemination, this method, had we followed it, would have resulted in many years—perhaps several decades—of work before we would have been able to publish a suitably careful edition. Given the importance of this material, and the size and dispersal of the collection, we had to go much faster: we wanted to give a lot of texts to the researchers as quickly as possible.
§ 9 For this reason, we decided to focus the edition not on the original scattered charters but on an impressive medieval compilation, the Cartulaire blanc (White Cartulary [CB]), which was finished about 1275 and updated until 1300—providing about 2,600 copies of charters transcribed by a small number of monks. Hence, some of the documents are not taken into account (500 more charters at least were probably kept in the archival fonds during the same period, and not copied in the cartulary). In addition to making significant use of the CB, our edition is a digital edition, which allows to publish progressively, not waiting until all the work has been completed: as soon as a chapter has been edited, its edition is published online and can be corrected and enriched afterwards.
§ 10 The edition is also significant for its integration of high quality digital surrogates (images) of the whole original manuscript from the beginning of the process. This has two purposes: first, to allow navigation between the edited texts and their primary source (and thus to provide a check for the quality of our edition); and secondly to give online access to the material in chapters that have not been edited yet.
§ 11 In order to help researchers to search, retrieve and reach quickly and accurately any text within the CB, our edition also makes the most of the immense amount of work done by the monks responsible for the Inventaire général (General Inventory [IG]) from 1680 to 1710 (Guyotjeannin 1999; Guyotjeannin 2002). This French-language document consists of a chronologically ordered series of calendars made from several sources that are precisely identified: the original charters, the copies found in the CB, and a few minor compilations. This inventory has been digitized and its images are now part of the digital edition. Its calendars have also been edited, with the aim simply to obtain a careful transcription, coming with the comments necessary to understand and use its content.
The digital corpus
§ 12 In order to publish this quite large and composite corpus in an appropriate way, we have defined some models that take both the specificities of the two manuscripts and the needs of interdocumentary navigation into account. The models are thought of so that the conforming files are reusable. We have built a web application that is able to store all the files (texts and images), index them, make them searchable, and display them, and whose contents and functionalities are easy to enrich.
§ 13 We are now going to tell more about these choices.
|The Cartulaire blanc
(French National Archives,
LL 1157 and LL 1158)
(mostly made between 1250 and 1300)
|The Inventaire général
(French National Archives,
LL 1189, LL 1190 and LL 1191)
(mostly made between 1680 and 1720)
|Number of images
(and of pages, approximately)
|Number of charters concerned
|2,600 charters or so (copies)||3,285 calendars (description units)|
|Number of units that have been edited today||411 charters (16%)||3,285 calendars (100%)|
§ 14 Table 1: The composition of the digital corpus today. Printing the files of the partial edition of the charters of the Cartulaire blanc would produce 600 pages or so, without counting the historical introductions that precede each chapter. The printed edition of the entire Cartulaire blanc would probably be composed of more than 3,600 pages.
The digitized IG
§ 15 The IG of the charters of the abbey is a bulky manuscript composed of fourteen volumes. Only the first three volumes (kept by the French National Archives in Paris, shelf-mark LL 1189, LL 1190 and LL 1191), dealing with the charters dated seventh-thirteenth centuries, are concerned by the project. The IG was written by a monk of the abbey, Dom François Thomas, and by a few anonymous monks after him. It was completed at the beginning of the eighteenth century and a fair copy of it was made giving it the form under which we know it today. After that, only a few slight changes were made in the manuscript. During the nineteenth century, an archivist numbered the calendars, writing in the margins with red ink and using a unique continuous sequence. The website provides more information about this document (see http://saint-denis.enc.sorbonne.fr/les-textes/inventaire/presentation/distribution-des-volumes.html and the following pages).
Digitization practices and problems
§ 16 The digitization of this manuscript was funded by a grant given to the French National Archives by the French Ministry of Culture, within the framework of its 2006 call for projects on cultural heritage digitization. The grant was given to us on condition that the work was done by a private company. The ENC joined with the French National Archives and directed the operation. We chose our service provider (Dataland, which was bought out by INOVCOM since then) via an open call for tender.
§ 17 As regards the making of the digital images, no specific difficulty had to be overcome. We asked our service provider to follow a usual method in such a case: digitize the books themselves; make one high quality image per page, each one having a resolution of 300 dpi; store the images in TIFF LZW format so that they could be integrated into the French National Archives' information and preservation system, and from these images, create JPG images for the future website.
§ 18 In order to digitize the text of the IG, we chose to apply an international, well-known and documented standard: the EAD 2002 XML DTD (see http://www.loc.gov/ead/). We can put three arguments forward to justify this choice:
- Despite of some intrinsic limits linked to the peculiarities of eighteenth century erudition and diplomatic science, the IG can compare with a modern analytic archival finding aid. Each volume contains a “flat” sequence of calendars that are arranged following the chronological order of the documents that they describe. It has been designed in an abstract way: it includes almost every witness of the written texts even if the original charter is lost, and never gives a physical description of the original documents nor provides any information about their accession number or arrangement. Each description unit nevertheless contains, in the margin, the red-ink number that was added in the nineteenth century, then the calendar itself (for an example, see Figure 1; another example is available here: http://saint-denis.enc.sorbonne.fr/le-projet/aspects-informatiques/inventaire/structuration-du-texte.html). The calendar begins with the specification of the legal category of the document, then contains a summary of the charter, ending with the date (year) of the charter in the middle of the page. The calendar ends with the tradition stemma, which is always presented in this order: the original charter, the copies within cartularies including the Cartulaire blanc, and the editions. Supplementary information is sometimes provided in the margin or at the end of the description unit. The EAD 2022 DTD is the XML model used everywhere to encode finding aids, so this analytic finding aid can be encoded naturally according to that DTD.
- The EAD 2002 DTD is appropriate both to the nature of the document and to our needs. We do not consider the IG as a primary source, within which we would for instance look for the traces of the purposes or practices of its authors. We also do not consider it as an illustration of the development of erudition or the history of the abbey during the modern period. Instead, the project focuses on the content of the IG, since this gives us an accurate idea of the extent and composition of the series of charters which were available in the seventeenth century, and since it systematically refers to the CB. In other words, our point of view on this manuscript is the same as that of the seventeenth-century archivists who compiled it: as a set of organized metadata, describing other documents. If we had considered this inventory as the main object of our study, we would probably have chosen another standard, i.e. TEI.
- The EAD 2002 DTD is the main XML reference model for the French National Archives, which hold the IG as well as the Cartulaire blanc and the main part of what remains of the Abbey of Saint-Denis archival fonds. The EAD files created during the project will thus easily be integrated into the French National Archives' information system, as other useful old finding aids will be.
§ 19 Since each volume has its own titlepage and is an intellectual unit, there is one EAD file per volume. Each description unit is encoded within a Component <c> element, which has a unique identifier (an id attribute), identical to the red-ink number found in the original manuscript. Each of the information segments considered useful for indexing and searching purposes (the full red-ink number, the date, the summary, the list of witnesses, the additional information, the authorial notes) is isolated and entered within the appropriate EAD element.
§ 20 This first work was done by our service provider. The IG not only provides the same pieces of information for each document it describes, but it always gives those pieces of information in the same order and place in the description unit, applying the same layout rules to them. That remarkable feature has enabled a direct, consistent and efficient encoding process. The markup proved to be of very good quality.
§ 21 On the other hand, the transcription of the manuscript happened to be very difficult for the company. In fact, the writing, even though it is quite clear, is nevertheless very different from modern handwriting. The text also conveys a lot of technical and rather rare vocabulary (such as the notes on the evidence used for dating, or phrases borrowed from the medieval legal vocabulary) and a great number of proper nouns, some of which are in Latin. We knew we would have to correct a lot of transcription errors and check a lot of uncertain readings. But the number of errors and uncertainties turned out to be far higher than expected.
Correction and enrichment
§ 22 During the second stage, an ENC student in medieval history and diplomatics, Sandrine della Bartolomea, carefully re-read and corrected the EAD files with the help of the images in order to obtain a transcription whose quality conforms to the rules usually applied to modern texts. She also enriched the encoded transcription so as to make this inventory easier to understand and use. We thus therefore have now XML files that include:
- Annotations about the authors' deletions and substitutions, as well as their syntactic errors;
- EAD linking elements making explicit the relations established by the authors from one calendar to another;
- Fine ISO 8601 standardized date information about the described documents (given dates are adjusted when inaccurate or wrong, accompanied by the Gregorian calendar date if necessary, and an approximate date range is set to the documents whose description did not provide any date);
- Complementary bibliographical references added to the eighteenth-century list of witnesses and printed editions;
- Specification of the authors of the charters, as long as they are named by the calendars.
§ 23 Each of the EAD elements containing those editorial additions is characterized by an attribute specifying that responsibility for the addition rests with the ENC. Retrieving and editing those additions, or even automatically extracting the transcribed text alone from the XML files, would thus be easy.
§ 24 This work was done on a part-time basis over two and a half years—corresponding to the equivalent of approximately sixteen person-months full time. At the end of the process, a series of tests were performed to check the quality and consistency of the files. Finally, in order to facilitate the navigation within those files and to offset the lack of internal structure (there are no chapters or sections within this inventory), we grouped the description units, without changing their order at all, by making twenty blocks or so, each consisting of a parent <c> corresponding to a historically significant date range and its children elements. We made the same choice concerning this kind of material as the authors of many contemporary print-based analytic finding aids, which often group the descriptions by periods (e.g. reigns) even if the records are not arranged in such a way. This produced a two-level table of contents for each of the three volumes, which is far easier to browse than a flat summary holding hundreds of entries. It is also easily reversible and does not affect the search functionality.
The advantages of using an external vendor
§ 25 Having completed this process, and with the advantage of hindsight, we still believe that entrusting a service provider with the task of transcribing and encoding the manuscript was a good choice, despite the defects of the initial transcription. After only a few months and for a reasonable price we had access to an entire transcription of the texts in valid and accurately structured EAD/XML files. Doing this work by ourselves would have taken much longer. As a result of this decision, at the beginning of 2008, we had a global knowledge of this part of the corpus and we were able to begin to think of the website.
The digitized CB
§ 26 The CB is the richest and the most careful medieval cartulary that has come to us from the abbey of Saint-Denis. It consists of two thick volumes of large dimensions and beautiful layout (kept by the French National Archives in Paris, shelf-mark LL 1157 and LL 1158, available in microfilm only until our website was released). At first sight the CB seems to have a great formal unity. But after having simultaneously studied its content, its layout and its writing, we can distinguish four stages in its making. The first copyist laid the foundations of the work, so that the distribution of the charters by chapters was only slightly modified after him. Most of his transcriptions are chronologically ordered; they concern 1414 charters out of 2616– 54% of the present content of the cartulary. It seems that his work, which he stopped in 1250 or so, was then abandoned, and continued by a second monk in the 1270s. That continuator also created the initial table of contents; he copied 617 charters, i.e. almost 25% of the whole. During the following decade, a third copyist completed the transcription of 372 charters (14%), creating a few new chapters. He also completed the Ancien inventaire noir, which gives summaries of every charter copied until that period. The last stage, which occurs in the 1290s and stops suddenly in 1300, mainly consists of supplements transcribed roughly contemporaneously.
§ 27 As already said, the CB is subdivided into thematic or, much more often, topographic chapters (see http://saint-denis.enc.sorbonne.fr/les-textes/cartulaire-blanc/parcourir/table-des-chapitres.html). Several traces (such as notes written in the margin of the table of contents) prove that those chapters follow the order according to which the charters were stored: they correspond to a container, either a box (layette) or a chest. The pagination system that we still use today, and that is also used within the IG to refer to the CB, was added to it during the sixteenth or seventeenth centuries. Within each chapter, the charters have been copied for the most part following an accurate chronological order. Most of those copies are preceded by a red-ink rubric and have a sequence number in Roman numerals, which sometimes also can be found on the back of the original charters, and which we use in our edition. The website provides more information about this document (see http://saint-denis.enc.sorbonne.fr/les-textes/cartulaire-blanc/presentation/problemes-de-date-et-fabrication.html and the following pages).
§ 28 At the beginning of 2007, the French National Archives, which had already obtained digital surrogates of the whole CB, provided us with a copy of those images; so we received colour 150 dpi resolution JPG files, one image per page.
Editing in MS Word
§ 29 In 2007, two chapters of the CB (those named Rueil and Tremblay) had already been edited using MS Word. These files had been manually converted into TEI/XML files, using TEI P4, by Gautier Poupeau, who had published them online using the TELMA platform. The Word critical edition of the chapters called Beaurain and Pierrefitte was finished during the summer of 2007. Several other Word files were prepared between 2007 and 2011, among which the editions of the chapters called Dugny and Saint-Martin-du-Tertre.
§ 30 The editorial work is coordinated, checked and homogenized by the scientific leader of the project, Olivier Guyotjeannin. The work is shared between several people, most of them being students of the ENC, with some contributions from external researchers. This corresponds to the internal organization of the cartulary itself: each chapter is edited by a group of people. MS Word and OpenOffice.org software remain natural and indispensable tools.
§ 31 As we already mentioned, the method we are following for this project is different from the method that is usually followed in France to make a critical and diplomatic edition from a series of medieval charters, as explained in École nationale des chartes (2001). In order to go rather fast without being overwhelmed by the great quantity of disseminated original charters, we concentrate on the copies of the documents that the CB contains. Those copies, which are of course given the B sigil in the stemma, are set to reference witnesses.
§ 32 This principle, along with the decisions to edit and use the IG and to make a digital edition, lead us to establish the following editorial rules:
- If a quick and not exhaustive investigation enables us to find the original charter, that charter is of course mentioned in the tree of the tradition (and has the A sigil), but its readings are taken into account only when the CB gives an odd or very different text—in other words rarely;
- Other witnesses are neglected, except the interesting calendars found in the three main inventories of the charter series, of which the IG is the third and most recent. A specific section of the stemma is dedicated to those calendars. The calendars located within the first two inventories are entirely edited in that section;
- In order to help navigating and searching the future website, two pieces of information are added to the usual metadata in a Word table stored apart from the chapter file: a very short summary of the content of the document from which a global list of the charters is generated; and one or more terms specifying which category each author of the written document belongs to (according to the same vocabulary used to index the calendars of the IG);
- Historical notes, particularly those on places and settlements, are to be as precise and rich as possible;
- An introductory text is to contextualize the edited documents of each chapter and suggest a few avenues of research.
§ 33 Choosing TEI to encode the edition of the CB is natural for an institution which is used to applying TEI to each of its electronic editions. However, in 2007, there was no specific XML schema to control and guide our encoding process. This was also the case for all the digital textual corpora that the ENC had already encoded using TEI: the past years had been a time for experimentation.
§ 34 In November 2007, the TEI Consortium released the Guidelines and source of TEI P5 ; therefore we had to appropriate a new generic conceptual model and to think of converting our existing P4 files into P5 ones.
§ 35 Regarding the recent and accurate critical edition of the CB, the modelling and encoding work was done rather slowly, in three stages. During the last one (from June 2010 to spring 2012), we adjusted the rules we had defined among other purposes in order to make them as close as possible to the rules that had meanwhile been defined for the Cartulaires franciliens project (see http://elec.enc.sorbonne.fr/cartulaires/schema).
§ 36 The TEI model that we use now can therefore be considered a result of that work. The ODD file, that includes French documentation, and the Relax NG schema derived from that file, are downloadable from the website, along with the six TEI files we have already made.
Structure of the TEI files
§ 37 Let us now mention some characteristics of our model.
§ 38 The first logical unit within the CB, and hence our work unit, is the chapter and each chapter is edited by a specific group of persons at one specific moment of this project. This means that there is one TEI file per chapter. This method also enables us to integrate the historical introduction easily into the file that is written for each chapter.
§ 39 The structure of a TEI file in our project is based upon a wrapping <group> element which contains a series of <text> elements that correspond to the chronological sequence of the charters. There is therefore one <text> element per charter. Each <text> element has an identifier (i.e. an <xml:id> attribute), whose value begins with a code specifying in which chapter the charter can be found; this means that if we want to create a master <teiCorpus> file later on using XInclude to include the chapter files, the identification system will remain correct.
§ 40 Within that <text> element, the <front> element is used to encode the analytic metadata that precede the transcription of each charter in a diplomatic edition. We could have used the <sourceDesc> element in the teiHeader section to encode these metadata, as it is usually done or at least suggested in the TEI Guidelines; we preferred to make blocks containing everything needed in order to build a diplomatic edition. It would be easy to move those metadata from the <front> elements to a sequence of <msPart> elements in a <msDesc> wrapper within the <sourceDesc> element, however.
§ 41 The <body> element within the <text> contains, in the following order:
- The tree of the tradition. As that stemma is somewhat consubstantial to the text, which cannot be understood without it, we consider that it does not belong to the metadata block. The tree of the tradition is rather extensive and has some specific features in our project. We use one <listWit> wrapping up to four specific <listWit> elements: one (optional) for the list of original charters, one (mandatory) for the copies, one (mandatory) for the inventories, one (optional) for the authoritative editions. Within the inventories <listWit>, as we have already said, we edit the text of two medieval inventories, and refer to the IG.
- An optional <div> element containing some diplomatic comments;
- An optional <div> element (present for most of the documents) containing the transcription of the document. The critical apparatus notes are encoded within <app> elements in the transcription, following the parallel segmentation method. In the <app> element, there is always one <lem>, which is used to markup the reading of the CB, or the editor's version when he prefers to give his own different text, using elements from the transcr module if necessary (e.g. <expan>, <subst>, <add>, <del>, <choice>, <sic>, <surplus>, etc.). If the <lem> element contains the editor's version, the reading from the CB is encoded within a <rdg> element that has the appropriate wit attribute. If the editor mentions other readings, these are encoded using additional <rdg> elements. More generally, we express in TEI, as accurately as possible, the readings provided by the Word files and the information contained within the associated comments (the editorial annotations). The content of those comments is thus often re-written in order to use <rdg> elements. The exact text of those comments is systematically encoded within a <note> element, always in the <app> wrapper element. The same principles and methods are applied to the edition of the calendars found in the two medieval inventories mentioned above.
§ 42 The historical notes are encoded apart from the transcription (i.e. external to the <body> element), within the <back> element of each <text>. This method follows the editorial process: the historical notes are usually added to the text after it has been established, not created and written at the same time. Besides, within our edition, a historical note is often used to make several phrases clear (e.g., a named entity within the summary, and another phrase referring to the same person or place within the transcription). Using an external method enables us to encode each note only once (per charter), creating one or more links to this note as necessary.
§ 43 The <back> element may optionally contain, in a specific <div>, some references to the appendices (such as family trees, maps, and photographs of the mentioned settlements) using <ref> elements to link their digital representation to the TEI file.
§ 44 In short, our model enables us to reproduce as faithfully, accurately, and simply as possible the structure of our Word edition and the phenomena and variants within the initial transcription without adding extra information. A clear, reliable and consistent basis is thus created which can be re-used elsewhere and can also be enriched (e.g. by encoding some readings coming from other witnesses) without any difficulty. It will also be easy, if needed, to extract the transcription of the CB itself from the TEI files.
Converting from Word to TEI
§ 45 A few words about the way we apply these rules to the Word files. Until now a small group of people has been performing that work manually, except for some final adjustments processed by XSLT programs. At the beginning of the project this manual method was the only option, because we needed to evaluate potential problems and solutions. Encoding a chapter containing one hundred copies of charters or so has proved to take approximately fifteen days. A student in medieval history and diplomatics, Laura Gili, encoded the editions of the chapter of Dugny (published online in June 2010) and of Saint-Martin-du-Tertre (published online in May 2012).
§ 46 The main difficulty comes from the fact that a word processor made file can be inherently ambiguous. When one encodes it, one sometimes has to desambiguise it without misrepresenting the editor's thought. It happens, for example, that there are several ways to understand, thus encode, the critical apparatus notes, or to establish exactly where in the text the variants have to be anchored. These ambiguities can be cleared up by going back to the manuscript through its digital surrogates, or by talking with the editor. The website helps to detect the interpretation errors that may remain. In any event, this experience has clearly and practically confirmed two of our statements about using TEI to make critical editions of manuscripts:
- If a critical edition of the manuscript pre-exists, made by using word processing software, one cannot encode this edition accurately without having access to the manuscript or its digital surrogates, even if this edition is recent and of high quality;
- In such a situation, the editor should encode the file or at least be closely associated to the encoding process.
Relations among components of the corpus
§ 47 Another key purpose of the project was to enable navigation between the four subsets which compose the digital corpus: the collection of digital images made from the entire IG; the collection of images made from the entire CB; the EAD files forming the digital edition of the entire IG; the TEI files forming the partial and in progress digital edition of the CB.
§ 48 The following relations are therefore precisely expressed within the digital corpus:
- The relation between a physical page of the IG (or one of the CB) and the digital image that reproduces it;
- The relation between an intellectual unit within the IG (i.e. a calendar) and, first, the page(s) on which that unit is written, and, second, the intellectual unit within the CB to which it may correspond (i.e. the copy of the text of the charter within the CB). As mentioned above, the IG fortunately provides some direct information about those relations—though not all the information needed. As shown in Figure 1, the list of witnesses that is written at the end of each calendar mentions the copy of the charter within the CB (if this exists), and refers to the volume and page of the cartulary (and unfortunately not to the chapter and copy themselves).
- The relation between an intellectual unit within the CB (the copy of a charter) and, first, the page(s) on which that unit is written, and, second, the calendar within the IG to which that unit may correspond. For each of the edited documents within the CB, the editor identifies the existing description within the IG; for these texts only, we can use carefully checked information.
§ 49 In order to precisely and efficiently express this network of relations, we use the METS/XML standard (see http://www.loc.gov/standards/mets/), because it fits exactly our needs. At the heart of that model, which many digital libraries use, one finds two key concepts that can be applied to any documentary unit that exists in the real world and is digitized:
- A physical structure map (which contains the declaration of the physical components of the unit, such as pages);
- A logical structure map (which contains the declaration of the logical components of the unit, such as chapters).
§ 50 Both of these maps then specify by which files each of the components of the unit is represented, and to which other units those components correspond.
§ 51 We could have used the ad hoc linking elements within the EAD and TEI files to express these relations. Using METS has, in our opinion, three key advantages in comparison with that other method:
- Each of the EAD and TEI files remains totally independent of the others, so that it can be re-used out of the framework within which it is used today, separately and without the images;
- Managing the expressed relations is a separate task. If for instance we find an error about a link between a calendar of the IG and a charter of the CB, we only have to correct two lines within the IG and CB METS files, and the website libraries can immediately return the updated HTML pages to the user's computer.
- The long term storage of the digital corpus is facilitated. Each of the METS files contains a list of all the files composing a consistent digital set, and is used to encode the technical, descriptive, legal and preservation information about those files. In the coming years, we will probably formalize a contractual agreement with an archive, so that we can confide to this institution's care the files that we have produced and that have to be indefinitely stored, under conditions that guarantee their integrity and their accessibility for us and for anyone having the right to access them. This institution will probably, as it is usual, ask us to provide METS wrapping files for any submission information package to be stored.
Preparation of the METS files
§ 52 The XML-METS files were prepared in two stages. The first is now finished and was mostly made by our service provider, during the digitization of the IG, since the original manuscript had then to be leafed through and even read. He created, within one METS file per volume of the IG, the list of image files, the physical structure map and a first state of the logical structure map, expressing the relation between a calendar and the page(s) within the CB.
§ 53 We also asked our service provider, who had the CB images, to create the list of image files and the physical structure map of the CB within another METS file. We then created manually the logical structure map of the CB METS file.
§ 54 The second stage of the work is processed whenever a chapter of the CB is edited: we enrich the logical structure map of the CB METS file, adding the information that the editor provides in the tree of the tradition of the charters. We check the part of the IG METS files that deals with the calendars which describe these edited chapters. We then modify automatically the IG METS files, in order to refer, not to the pages of the CB, but to its logical units (the charters).
§ 55 These tasks are quite tedious, but essential. The result remains partial today, particularly because if an unedited copy of a charter within the CB spreads over more than one page, the IG (thus the logical structure map of the IG METS files) refers only to the first corresponding page within the CB.
§ 56 The website was released in June 2010 and we are now preparing its second version.
Choice of platform:TELMA vs. Pleade
§ 57 The website does not use the TELMA platform, which was built by the ENC and the IRHT in 2006 but eventually appeared to be too complex and unstable. Since 2009 the ENC has been developing another platform, Diple (see https://sourcesup.renater.fr/projects/diple/), but that work, when we began to design the website for our project, was far from being finished and from having produced any usable tool; it does not yet provide all the functionalities needed for our project.
§ 58 Our website uses the Pleade framework (see http://www.pleade.com). This open source GPL-licensed software has been written in order to index EAD files, make them searchable and display them within a dynamic web environment; it also natively includes an image server and viewer. Consequently, choosing this tool, we had an appropriate and configurable solution to publish the EAD files and the images. It is a Java servlet that uses for the moment the Cocoon XML publishing framework, the SDX Java library and the powerful Lucene search engine. It is easy to deploy in a J2EE environment — e.g. Tomcat as a servlet engine and Apache as a HTTP server. It is highly configurable and adaptable. However nothing, or almost nothing within this solution (in Pleade 3.3 or 3.4 GPL versions) is provided in order to index, to search and to display complex TEI files or to use METS files.
Principal features of the website
§ 60 Here is a list of the main functional features of the website, with a few details on their actual and future characteristics:
Direct access to each of the four subsets (the two collections of images, the edition of the IG, the edition of the CB)
§ 61 Direct access to the images of the CB is imperative, since the critical edition will not be finished for a long time. Researchers can now work online on the two volumes, and we hope that we thus can give rise to new contributions, maybe even new uses. Above all, this direct access to the two series of images enables researchers to validate and correct the edition, and to know much more about the physical documentary objects and about the way the text has been written on the support.
§ 63 What is new is that each image viewer page of our website includes the metadata needed to list (and to reach via hyperlinks) the intellectual units written on each digitized page in the top horizontal division of the screen. A search form is also available that can be used to enter the number of the digitized physical page you want to display. On the right of the image section, there is a vertical division in which the edition of one of the displayed logical units (calendar or charter) appears when clicking on the appropriate link within the top horizontal division. Therefore, on that webpage (see Figure 7), the displayed digitized manuscript page is contextualized; it is still the main object presented, but it can be juxtaposed to a view of part of the XML-encoded text it bears.
§ 64 Concerning the direct access to each of the critical editions as a whole, the aim is to be as convenient as possible, and to conform to the nature and to the internal structure of the edition. Therefore, for each XML file, a clickable table of contents stands in the window so that one can easily reach one of the intellectual units and display it entirely on the screen.
§ 65 We also virtually recompose, thanks to the CB METS file, a cumulative chronological table of the edited texts of the CB. This frees the reader from being restricted by the thematic organization of the cartulary and provides another view on the corpus (see http://saint-denis.enc.sorbonne.fr/les-textes/cartulaire-blanc/parcourir/table-des-actes.html). This table of contents will soon be made dynamic, sortable and filterable.
Direct access to each smallest intellectual unit
§ 66 We have re-written the sitemaps so that the URLS used to access directly to these intellectual units, as well as those concerning the edition of a chapter of the CB or of a volume of the IG, are French, signifiant and easy to understand and memorize by the user. The patterns of these URLS only use path steps and slashes (see some examples at http://saint-denis.enc.sorbonne.fr/le-projet/aspects-informatiques/generalites/caracteristiques-de-l-application.html).
§ 67 Every useful piece of information that has been encoded within this unit therefore is displayed in the HTML page: if does not conform to the canonical arrangement of of the printed edition, it at least follows the logical reading order, and visually distinguishes what the original text contained from the editor's annotations.
§ 68 Within the edition of the CB, the focus is on the text of the main witness, the CB, or on the text proposed by the editor. Thanks to an AJAX request, the charters' stemma is enriched: the HTML webpage that corresponds to the edition of a charter is completed by a paragraph showing the text of the IG calendar.
§ 69 One can export the edition of a calendar of the IG to a PDF file, and we are going to add this functionality for the edition of the charters.
Search functionalities for each of the editions (quick “full-text” and multi-criteria forms)
§ 70 For instance, searching the text of the edition of the CB (http://saint-denis.enc.sorbonne.fr/recherche-avancee.html?document=cartulaire) is possible through indexes built from the names of the chapters, the identifiers of the charters, the main language of their text, their dates, the categories of their author(s), the words within the editorial summaries, and the words within the transcription (without the critical apparatus and the historical notes). Thanks to the CB METS file, one can also search a page number and search the only charters that have been described within the IG. The search engine enables wildcard, fuzzy and proximity searches. One can also directly browse the indexes of the authors of the charters and calendars.
§ 71 The search results pages will be improved so as to better take into account the needs as regards searches within the text of the calendars or charters (we will display the relevant segments of text in a KWIC-like mode). We also plan to enable to choose a sorting criterion before displaying the results coming from a search within the CB: this will make statistical research easier—concerning for example the evolution of the authors of the documents in time. We are also thinking about the setting of graphic representations of the application contents. Widgets such as the Timeline wigdet or the Exhibit one, from the MIT Simile project (see http://simile-widgets.org/), are being tested to project the CB charters onto a horizontal timeline.
Hyperlinks between the physical and logical components
§ 72 Thanks to the METS files, these links are brought back to the windows that display the edition of a calendar, the edition of a charter (within the tree of the tradition) and the images.
A web content management interface to publish new versions of the EAD and TEI files in a few seconds
§ 73 Before publishing a TEI file that corresponds to the edition of a new chapter, it is necessary to edit and modify only a few configuration files.
Assessment and prospects
§ 74 At present, we can make an assessment of the previous work and evoke new prospects.
Improvement in editorial workflow
§ 75 Six chapters (411 documents) of the CB have been edited — i.e. 16% of the texts of the cartulary. The edition of two other chapters (120 more charters) will be online within a year.
§ 76 On the whole, the nature of the scientific editors' work is still the same and requires the same time as before. Therefore, the enterprise that has been undertaken is still to last many years.
§ 77 However, in the future, the work will be easier thanks to two factors: easy access to the whole of the digital images and the existence of a TEI model. This will not only reduce effort but also gain time, higher quality consistency with the manuscript, and better internal coherence.
§ 78 At least to start with, the first stage will always consist in making files with word processing software. Converting those files into TEI files will be possible by combining an automatic process with manual enrichment operations, which will mainly concern the critical apparatus and will be done by medievalists trained to TEI—probably Master 2 students or graduates of the ENC.
§ 79 In the medium term, it would obviously be of great benefit that the editors of the CB could encode the edition themselves, and that the encoding could be done much sooner in the editing process. This would imply either implement an ergonomic and collaborative editing interface (which we have neither found nor developed yet), or train the editors to the creation of TEI files with the help of an XML document editor such as oXygen.
§ 80 It must be emphazised that for the last few months, the scientific director of the project has been able to manipulate the EAD and TEI files, which is far from the usual practice. He is now able to work with the TEI files directly. He can also use the web content management interface.
§ 81 We also expect our TEI model will probably evolve over time. Various enrichments could for instance be thought of in order to markup the named entities or the segments of the diplomatic discourse, so as to apply to the text, out of its present web environment, some textometric or linguistic tools, developed within the framework of other projects at the ENC.
§ 82 As regards the strict framework of the project, given that the digital surrogates of the CB have entered the edition, it will be interesting to better take into account the inscription of the text on its physical support. Encoding the page or even column breaks would obviously be very useful.
§ 83 Besides, taking original charters into account directly is now possible. The documentary models and the web application can easily be used, after a quick adaptation, to integrate a new digital subset into the corpus, whether this is composed of the digital images of the original charters only or of their critical edition as well. For this work, which could only be partial and which would require reinforced human resources, editing distinct TEI files would be more in accordance with the nature of the project. This would finally enable us to bring together and to link up to three witnesses of the tradition of some texts. Such an evolution would move the project farther from the usual editorial methods.
Historical research results
§ 84 Concerning the humanistic investigation on the series of medieval charters, it is too soon to estimate the benefit of the operation. However, we can say that during our work we could check some hypotheses, such as the great reliability of the lists of witnesses within the IG calendars, at least for the 411 charters already edited. Previously unedited documents which were hard to access are now accessible, and the functionalities of the application guarantee more reliable and faster answers to the scientific questions than with the means available until now. The chronological table of the edited charters is a key tool for gaining an overview of the charters and establishing connections. Such means will certainly allow us to better understand some aspects of the history of the management of the abbey property (e.g. the policy of land occupation and of extinction of the mayor function, the conflicts about the tithes, the relationship with the monarchy, etc.).
§ 85 We have already mentioned some of the evolutions that we have begun to bring to the website by the end of 2012. Another work orientation concerns the interoperability of the digital corpus. From Pleade, which natively includes an OAI-PMH repository for EAD files, we will build an OAI-PMH repository, that will contain as many dynamically generated Dublin Core and qualified Dublin Core records as there are in the digital corpus (one record per volume, one per chapter, one per calendar and one per charter). That will enable several service providers (such as MICHAEL or GALLICA) to harvest rather precise metadata about the corpus, to index them and to make them available with others.
§ 86 We also plan to build a model in order to generate RDF files that could accurately represent the world of the Saint-Denis medieval charters. We will use several ontologies and vocabularies, including the CIDOC-CRM model, the recent and interesting LOCAH ontology, the Archives de France's archival thesaurus and maybe the VID. We simply aim to make a test without any precise purpose and schedule, for the moment at least. However, RDF may help to find solutions to obtain useful and rich graphical representations of that complex set of objects, of their properties and their incoming and outcoming relations. Besides, the ENC may soon have to explore the possibility to use these RDF ontologies and vocabularies within other projects and for other corpora. Anyway, the Saint-Denis RDF files could also be directly used by the CNRS ISIDORE search engine (at least, we hope so), instead of the poorer OAI-PMH repository records.
§ 87 Finally, as we think that it could help any entity which has to publish online both EAD archival finding aids and TEI P5 editions of the documents that the EAD finding aids describe, we have shared our code with the main developers of Pleade software, and do hope that this will help them to make it generic and integrate it into the software source code, as a TEI module.
§ 88 Here are a few statements we wish to make as a conclusion.
§ 89 First, regarding the process. This project has been a very interesting adventure which enabled us to think about various problematics not or little explored so far at the ENC. Rigorous processes taking into account the scientific principles, documentary standards and good technical practices and methods, as well as available resources, have been necessary. We have also had to show great flexibility. This project's actors have never had—and never will have—much time to work on it. Consequently, carrying out the project has taken longer than it would have if the various tasks had been led continuously. Yet, because of this, new ideas have had space to develop and we have been able to assess and make full use of the tools available to us, keeping in mind our project objectives and the necessity for pragmatism. A major factor in this success comes from the collaboration among a small group of persons. All of the actors have acquired useful knowledge and skills that they will be able to use in the future.
§ 90 Secondly, regarding the critical edition of the Cartulaire blanc. The choices we have made are rather different from the classical editorial rules, but we have made them reversible since they are expressed within separate XML segments within the TEI file. More generally, the old debates—on the order to be followed in the editorial process (chronological or compliant to the internal organization of the compilation?), on the rendering of written forms, on the necessity of having edited everything before publishing, etc.—have lost their importance in the face of this multidimensional, evolutive and self-contextualized work. The CB was originally the object of the edition; it has since become its starting point. We hope later on, thanks to the addition and interconnection of new modules (for instance, one focused on the original charters), our existing architecture, to integrate and overcome the ancient problems, nineteenth century dispersals, medieval and modern reclassifications, allowing a global reconstitution of the series of charters.
. The history of the Abbey is very rich and varied, but there is no recent, general and accurate account. The best starting point, combining history with topography and archaeology, is Atlas historique de Saint-Denis, des origines au XVIII e siècle (Bernardi et al. 1996).
. Among the candidate providers, only one was an academic centre, which obviously would have had the scientific skills to do a better transcription, but its quote was far too expensive for us.
. Although, of course, the ENC has considerable in-house expertise in paleography and critical editing, this does not mean that we would have been able to find four or five people having both the scientific and “technical” skills (e.g. knowing EAD) and the time to do this work (which might have involved five or six months full-time work per person). Even if we had done so, we can say for sure that it would have been far more expensive to use such personnel of this work, rather than contracting with a service provider and asking a student to read and correct the files afterwards on a part-time basis.
. The lack of open source XML editing software that is stable and well documented, fully TEI P5 compliant, generic, really user-friendly, ergonomic, easy to customize and deploy has often been emphasized in France (where there were several discussions and meetings in 2011 about this problem) and abroad. Today in France there are a few tools that go a bit further than oXygen regarding editing process, but they are still works-in-progress. We also know that several initiatives abroad, among which projects at King's College London, are working on this problem. For the moment, we cannot afford to build this very generic kind of tools ourselves. We are carefully following all these projects.
. Among other possibilities, within the OMNIA project, the ENC and other partners are creating the resources and tools needed to lemmatize the medieval Latin textual corpora, in order to better know the semantics of that language and of the corpora that use it. When completed, those tools could of course been applied to the already existing edition of Latin charters of the CB.
. See http://www.cidoc-crm.org/official_release_cidoc.html: CIDOC-CRM is a generic model for cultural heritage information, developed by the International Council of Museums; it is today an ISO standard (ISO 21127:2006).
. That ontology was developed and released on July 2011 within the framework of a British JISC funded project, Linked Open Copac Archives Hub (LOCAH) in order to make the data of two academic catalogues, Archives Hub (http://archiveshub.ac.uk/) and COPAC http://copac.ac.uk/, available as Linked Data. This is, as far as we know, the first research that has led to an ontology for archival resources. See http://archiveshub.ac.uk/locah/; the results are here: http://data.archiveshub.ac.uk/.
. See http://www.archivesdefrance.culture.gouv.fr/thesaurus/. This thesaurus is used to index French local archival fonds; it was recently released as a XML/SKOS file by the Service interministériel des Archives de France.
. VID (http://www.cei.lmu.de/VID/) is an online XML/SKOS version, made by Georg Vogeler (Ludwig-Maximilians-Universität München, Munich, Germany) of the Vocabulaire international de la diplomatique, ed. Maria Milagros Cárcel Ortí, 2. ed., Valencia 1997. It is incomplete for the moment.
. See http://www.rechercheisidore.fr/. ISIDORE is the Linked Data-based search engine developed by the French CNRS TGE ADONIS (http://www.tge-adonis.fr/); it aims to convert into RDF, index and make searchable the data representing the works and corpora made, or the resources used, by the French researchers in humanities and social science.
We want to thank Olivier Guyotjeannin for contributing to the content of this article and Jeanne-Marie Clavaud for helping us by reading the English draft.
École nationale des chartes, Groupe de recherches La civilisation de l’écrit au Moyen âge. 2001. Conseils pour l'édition des textes médiévaux. Fascicule I, Conseils généraux. Fascicule II, Actes et documents d'archives. Paris: Comité des travaux historiques et scientifiques: École nationale des chartes.
Guyotjeannin, Olivier. 1999. “La science des archives à Saint-Denis (fin du XIIIe-début du XVIe siècle).” In Saint-Denis et la royauté: études offertes à Bernard Guenée, edited by Françoise Autrand, Claude Gauvard and Jean-Marie Moeglin, 339-353. Paris: Publications de la Sorbonne.
Guyotjeannin, Olivier. 2002. “La tradition de l’ombre: les actes sous le regard des archivistes médiévaux (Saint-Denis, XIIe-XVe siècles).” In Charters, cartularies and archives: the preservation and transmission of documents in the medieval West: proceedings of a colloquium of the Commission internationale de diplomatique (Princeton and New York, 16-18 September 1999), edited by Adam J. Kosto and Anders Winroth, 81-112. Toronto: Pontifical institute of mediaeval studies.