§ 1 The purpose of this paper is to outline what might be called the
constructional principles of the present, online Anglo-Norman
Dictionary, or AND (www.anglo-norman.net).
Essentially, as the title perhaps suggests, this has to do with the relationship
between (i) the texts on which the dictionary draws (whether these have been
digitized or not), (ii) the citations which it draws from them, and (iii) the
dictionary entries. As will be apparent, the underlying methodology is no different
from what it was before the computerization of the project, and more generally,
before the advent of computing in the humanities from the 1980s onwards. I appreciate
that Humanities computing started earlier than the date suggested here, but for the
purposes of this project (and for the vast majority of researchers in Humanities in
the UK at least), the 1980s saw the emergence of affordable personal computers
capable of significantly changing working methods. What is different, however, is how
the lexicographical process is implemented in a digital dictionary, and (to some
extent) what the implications of this process might be.
§ 2 At the heart of any dictionary lies the assembling of the core data on
which it is based. In the pre-electronic age, there were three major components of
the process. The first was the production of concordances, exemplified by Alexander
Cruden's manually-compiled concordance to the King James Bible in the eighteenth
century. As William Youngman's biographical memoir of Cruden, appended to the printed
version, indicates (Youngman 1891, xiv), Cruden
appears to have spent a good deal of his life in
happy and harmless lunacy,
probably in part as a result of this enormous and painstaking undertaking. Around
1320, the English Franciscan Nicole Bozon describes the technique of the friars as
Pur ceo vodereie qe chescun feseit com fierent jadis les freres qe compilerent [les] concordaunces. Chescun prist gard a la lettre qe a lui fist mandee. Cil qe aveit A ne avoit qe fere de B, e cil qi out gard de B, rien se entirmettout de C; et si qe chescun lettre del abicee a divers estoit liveree, et chescun se prist a sa lettre, e nul de vousist de autri fet se entremetter ... (Meyer and Smith 1889, 160)
§ 3 This, in fact, looks ominously like the methodology often deployed in the construction of dictionaries.
§ 4 The second component of any dictionary project is typically the gleaning of quotations from texts. Traditionally, this entailed reading, and copying out onto slips a quotation, perhaps with a translation, a bibliographical reference, a date, and so forth. The Anglo-Norman Dictionary (henceforth: AND) still has a substantial number of these slips, deriving from the early days of the dictionary in the 1940s, or, in the case of the one million or so slips of the J.P. Collas collection, bequeathed to Professor William Rothwell. They are surprisingly durable and largely legible, as these original (1940s) examples show:
§ 5 Later examples from the Collas collection continue also to furnish invaluable material:
§ 6 Finally, the first edition of the AND was compiled via manually-typed and subsequently annotated on 210mm × 125mm slips, in which Professor William Rothwell's hand is (to the connoisseur) instantly identifiable:
§ 7 The third component in the compilation of the majority of dictionaries is the use of glossaries provided by the editors to critical texts. For a variety of reasons, not least a well-founded scepticism about the reliability of glossaries, the AND has not usually made much direct use of glossaries, and certainly does not rely on them. One conspicuous problem, familiar to anyone who regularly uses critical editions of medieval texts, is that glossaries typically omit precisely those words which the lexicographer finds interesting, or difficult. Whether this is because these words are of no interest to the editor of the text, or because their meaning is deemed self-evident and only problematic for the dim-witted lexicographer, must remain a matter for speculation. Less charitable interpretations to account for such sins of omission may well come to mind.
§ 8 At the heart of a dictionary (any serious dictionary) thus lies the assembling, and subsequent ordering (according to semantic or historical criteria) of a number of citations drawn from as wide as possible a range of relevant texts. Whilst dictionaries of modern languages can reasonably aspire to exhaustiveness, in particular by the use of vast electronic corpora, this is probably never achievable for medieval languages. In the first instance, it is very difficult to imagine successfully creating a properly representative corpus; and clearly, it is also in practice unlikely that we will ever digitize the entirety of the surviving documents in even as relatively circumscribed a language as Anglo-Norman. It would be more correct to describe even the most comprehensive dictionaries of medieval languages as making use of textual data banks or what I have called here text-bases, rather than corpora stricto sensu.
§ 9 The problem in this respect is not solely one for the lexicographer.
It is fundamental to our understanding of any past state of a language, of necessity
(if it is older than approximately a century) mediated exclusively through written
texts. These, however wide-ranging in register and type, and however much they may
appear to approximate to a reproduction of speech, are of course never an accurate
record of the living language in its spoken form. Moreover, as the following diagram
tries to show, the relationship between texts and the
language is analogous to Saussure's dichotomy between
langue and parole.
§ 10 In this, the text (the product of an individual author) is the Saussurean parole, a manifestation and an emanation of the irrecoverable langue, the phonetic, grammatical and lexical system of (in this case) Anglo-Norman. It is irrecoverable not only because we have no access to the spoken form, but also because (in the absence of native-speaker consciousness) we have no real way of fully grasping the complexities and the rules of the langue. Both denotational meaning and particular connotations may well escape us. The lexicographer of medieval languages is thus left with a quantity of examples of parole from which it may (or may not) be possible to reconstruct the underlying langue which is ostensibly the object of his survey.
§ 11 From the text, the lexicographer takes his citations. Those citations, in turn, are fed into the dictionary, constituted into articles, conventionally but not invariably organized alphabetically(although a rare but useful alternative is an onomasiological ordering by concepts), and, depending somewhat on the thoroughness and comprehensiveness of the dictionary, the dictionary itself becomes, at least as far as the lexis of the language is concerned, an attempt, however imperfect, to encompass the language as a whole (langue).
§ 12 Thus, a properly-conceived dictionary ought to be a means of access to the vocabulary of (ideally) an entire language, structured in a way which makes access for the reader as easy as possible. Digitization does not change this, although, as we shall see, the digitization process itself can impose order and consistency on major publications which are otherwise all too open to variability because of human factors, especially over the protracted time-scales within which the typical dictionary is put together. Dictionaries, in fact, lend themselves particularly well to digitization, not least because they are structurally very predictable. By this, I mean that they are very conventional in the way they set out information. It is not, for example, necessary to know any Spanish at all to make sense of the entry below, derived from a major etymological dictionary of Castilian, J. Corominas's Diccionario crítico etimológico de la lengua castellana (Corominas 1954):
§ 13 The typographical and organizational conventions adopted mean that even a reader with no knowledge whatever of Spanish will immediately work out that the first word (in capitals) is the headword, that what follows (in inverted commas) is the explanation of it, that the information that follows is etymological, and that the last paragraph supplies additional information about derivatives of one sort or another. Likewise, no knowledge of Sanskrit is needed in order to understand the gist of the entry dushta from the late-nineteenth-century Sanskrit-English Dictionary by Sir Monier Monier-Williams (Monier-Williams 1899):
§ 14 In other words, experience of handling dictionaries mean that the organizational structure is intuitively understood and (with the help, in particular, of typographical features) decoded. The process is not unlike that of reading a map, where experience leads us to know which way up to hold it, that the top of the map is (other things being equal) likely to be north, and so forth. The AND is no exception to this general rule. The sample entry below (janglure) illustrates the point. The initial words in bold are the headword, and the variant spellings attested. This is followed by an abbreviation meaning that the word is a substantive (s.) and a gloss, in English, and in italics. A series of quotations with bibliographical references (explained in the associated list of texts) follows, and the article concludes with cross references to related or otherwise relevant words. Component parts of the dictionary entry are identifiable by position and by typography. Therefore, structure and meaning go hand in hand.
janglure, janglur (janlur TLL ii 59)
s. babbling, (foolish) chatter: buccum: gabur, gangeler, janglur TLL ii 42; Malveise gent [...] Mult estes ore de mal escole Ke onur [...] ne feites A la reine [...] Ne lessates vostre janglure Mir N-D 179.130 → jangle, jangleis, janglement, jangler, janglerie.
§ 15 This, then, is an arrangement of material which, by virtue of its structural regularity, lends itself very readily to digitization in an orderly and structured way. Component parts of the entry are easily identified and easily fitted into a schema, and marked up accordingly. Even in the case of the AND, substantial parts of which were compiled in Microsoft Word and subsequently converted to XML (not the best solution by any means), the process was reasonably straightforward. Inevitably, in a work compiled over a period in excess of forty years, inconsistencies of abbreviation, textual reference, and so forth emerged, but not to such an extent that computerization of the original Word files posed insurmountable problems. Since 2003, however, the composition of AND articles has been done directly in XML, using a piece of editing software called EpcEdit (www.epcedit.com). From this it is possible to see the structure of the same entry janglure (see below).
§ 16 This is the editorial XML, that is, the simplified form
in which articles are produced. The dictionary's DTD (
definition, which prescribes the categories, and options within them, for markup
and encoding, and imposes the overall structure of the document and of its component
parts) imposes a template on the entry, incorporates markup for the different
structural elements and makes explicit those components of the dictionary which in
print form would be identifiable by typography or position or both. The DTD enforces
a uniform ordering of component parts of the article, and limits editorial choices in
a number of key areas (bibliographical abbreviations, usage labels, parts of speech)
to those which are pre-supplied in a drop-down menu. Variation resulting from human
inconsistency is thus largely eliminated and the disambiguating discipline of having
to abide by the constraints of the DTD is directly beneficial to enforcing regularity
of presentation. Articles in this form are subsequently transformed on the project's
server into canonical XML for storage and to form the version of the
data from which articles are generated, on request, by users. In fact, the canonical
XML is not (and could not be) dramatically different from the editorial version, but
it does add certain details and in particular, system-internal identifiers:
§ 17 The online articles which the user of the AND sees on his or her computer screen are produced directly from the XML of the canonical data, rendered on-screen in HTML. The exact date and time that appears on the screen is the date of production of the article, on that occasion, and on that screen.
§ 18 Looking in more detail at the XML encoding of a component part of the entry janglure, the extent to which the XML mark up identifies and classifies elements which the reader of a print dictionary would simply absorb more or less unconsciously, is apparent. For the following citation:
buccum: gabur, gangeler, janglur TLL ii 42
The XML looks like this:
§ 19 Each structural element (here coloured, though not in the original) is marked up: within the overall <cit> or citation is <quote>, the quotation from the source, the language of the (Latin) word buccum being glossed is indicated by a value "LA" (Latin) of the segment by <seg lang="LA">, the bibliographical siglum (TLL) is accompanied by a volume and page-reference which is separately encoded <loc>, and so on. The citation, the lowest-level component of the text-citation-dictionary triangle, is thus encoded in detail so that all elements will be identifiable and recoverable by XML-based search and query programs.
§ 20 In what follows, I will consider the way in which citations and texts are linked, and how both are exploited in dictionary production. There are three principal constituent elements in the Anglo-Norman On-Line Hub, within which the AND is one. The other two components are, firstly, a text base, consisting of digitized Anglo-Norman texts, and a couple of editions produced by W. Rothwell; the third element, and the one which underlies the other two, is the canonical XML data for the dictionary and text base alike.
§ 21 Digitization of the dictionary and of the texts and of the academic articles which are also housed on the hub follow the same DTD, and thus the constituent elements are entirely interchangeable. From the admittedly somewhat narrow perspective of a lexicographer, a text is just a series of (potential) citations, to be mined and exploited for the dictionary. Equally, the dictionary itself can be regarded as a series of articles, senses of words, and then, at the most basic level of the article, a series of citations. In other words, the entire operation can be seen (both in the dictionary and in the texts) as a concatenation of citations. In an ideal world, a given text containing (for example) six citations will feed into a series of articles. A citation from a given text is simply taken and transformed into a citation within a dictionary article. This, of course, is nothing new: it is how dictionaries have always been produced. Digitization, however, concentrates the mind on the underlying process, because the technology has to reflect what is happening. Identical XML encoding, following the same DTD, is the key to the process whereby the citation in a text becomes the citation in an article. Needless to say, this process only really applies in as direct a form as that to those texts which are digitized, and on the Anglo-Norman on line hub site, since for all other citations (and texts), they have no digitized existence outside the dictionary article into which they are inserted. For citations which do exist in digitized online texts, though, the transposition into a dictionary-article citation is (apparently) straightforward. A given citation in a text becomes a citation in an article:
§ 22 Let us take a concrete example of how this works. The following excerpt, from Walter of Bibbesworth's treatise, contains, amongst other things, two words of possible interest, in verses 317 and 319:
Veez ci veint devaunt vous
Un chivaler bieau tut rou
Qui une destrere sor se est munté reed
Esku de goules ad porté reed
Un launce rouge en l'uyn mein,
De vin vermaille l'autre plein,
Qi ne manjuwe point de peschoun
S[i] de le haranc sor noun reed
Je vie une reyne sanz rey quene
Pur une reyne fere desray frock
(Rothwell 2009, vv. 310-319)
§ 23 Both of these, as it happens, are found in the relevant dictionary entries (harang and desrei). From the lexicographer's point of view, in other words, they constitute separable and exploitable citations. This is, admittedly, a peculiarly mechanistic interpretation of what a text consists of, but it is, nonetheless, key to understanding the relationship between the different component parts of a dictionary/text-base set-up. Naturally, a given citation can be used more than once (and often is) thus, for example, in the Bibbesworth excerpt just quoted, line 317 contains also an instance of the word sor, of noun, and so forth.
§ 24 One of the most powerful facilities which the AND offers is the capacity to search the entirety of the citations in the dictionary, which are thus, from that point of view, simply one enormous electronic text. For the reasons given above, this does not mean that the dictionary is a corpus in the technical sense in which the word would be used by corpus linguists; but it does provide to the user, and indeed for purposes completely unrelated to the dictionary, a valuable extra resource. Occasionally, of course, this facility exposes shortcomings in the dictionary itself when, for example, the search of citations revealed words for which there is no corresponding dictionary article. So, for example, under feu2, we find the word examen, which (as a rapid perusal of the alphabetical list of AND articles will reveal), is absent as an entry in the dictionary. The message provided by a search of the citations in the dictionary is unambiguous in identifying the gap:
Your search for examen did not match any headwords or variant forms.
The form was, however, found in 1 citation:
transcrire et copier, ou jugement, par bon collacion et examen, le testament de feu Thomas de Uvedale Foedera iii 846 [sub feu2]
§ 25 In the light of this, the online AND has now
of course quickly added the missing entry examen. The relationship
between citations and articles operates also in the opposite direction, that is, in
allowing the user to go from a citation in the dictionary to its context. This, of
course, only functions in the case of those texts which have been digitized as part
of the project. Thus, for example, within the entry coillage,
levy, tax, the last quotation (from the Black Book of the
Admiralty) can be followed into the text itself:
coillage, cueillage; quilage
s. levy, tax: soloient paier la greindre partie de les dismes de la dite ville, leins et autres coillages […] Rot Parl 1 ii 213; une custume qe l'em apele hildenrath (l. hildevrath), c'est assaver, quilage des aveynes King's Bench v.cxxxv; ♦ keelage: que coillage (var. cueillage) ne soit payé par la coste d'Angleterre, mais ancorage B lk Bk 74 → kylage.
§ 26 This, clearly, is something which is not exclusive to a digitized dictionary that functions in this way, since the user has always been able to turn to a text which has been excerpted from the dictionary and revisit a citation in the broader context – provided, of course, that the text is available and at hand. Digitization makes this instantaneously possible anywhere in the world, and without needing recourse to a library. In this case, the wider context makes more explicit the specific sub-sense of keelage which, or so the dictionary argues (and see, by the way of comparison, Trotter 2003), the word bears in this particular citation:
Item, ordonné estoit illecques q'une manere de coustume seroit pris par tout le royalme d'Angleterre en eaue, et les admiralx estoient de ce fermement chargez qu'ilz ou leurs lieutenants deux foiz ou troiz foiz en l'an enquerront de ce fermement, ainsi que nul allene marchant ne privé ne soit endommagé par cause des coustumes, et que coillage ne soit paié par la coste d'Angleterre mais ancorage …(Twiss 1871, 74)
§ 27 The relationship between texts, the dictionary, and the citations which are the constituent elements of both, is not unique to the AND, and is also implemented in (for example) the Dictionnaire du Moyen Français. It is likely that in the future, with the growing availability of significant quantities of digitized texts available online (for example, Base de Français Médiéval, Nouveau Corpus d'Amsterdam), the practice will be extended and indeed generalized, with the proviso that, for this to be possible, strict adherence to mutually comprehensible encoding frameworks will be essential. At present, despite (or perhaps because of) systems such as TEI, there is wide variation in how materials are encoded, and many projects display a somewhat disturbing instability in terms of the target which they offer and to which links could be created.
§ 28 The question of electronic links to other cognate projects bring me to
the final part of this paper. No dictionary exists in isolation, and all dictionaries
are part of an international and multilingual network of lexicographical resources
which collectively attempt to record and explain the vocabulary of the languages of
the world. The AND is no exception. However, it has a
slightly unusual additional function, in that it stands between English and French,
as the record of the variety of French introduced into medieval England after the
Norman conquest, and which subsequently dramatically relexified the English language.
Just as Anglo-Norman was the vector for the transmission of Romance vocabulary into
(Germanic) Anglo-Saxon, so the AND is the link between
French and English lexicography. It has, of course, connections also to other Romance
dictionaries, and to dictionaries of related Germanic varieties (Dutch, German, the
Scandinavian languages) as well as to medieval Latin. This intellectual network has
always existed, and digitization does not transform the essential relationship
between the AND and other dictionaries. But where
digitization does substantially change the picture is in its potential for enabling
direct connections to be made from one online dictionary to another. This, like the
proper exploitation of online databases to which I have already alluded, is dependent
on the existence of properly encoded and, above all, stable targets for links going
from one dictionary to another, and for the moment this is not altogether
unproblematic. Nevertheless, as a dictionary like the DMF
already shows, the potential is there and, in the not too distant future, it is
possible to envisage a situation where all the relevant related dictionaries are
interlinked in such a way that it will become possible to rapidly review the entirety
of lexicographical evidence, irrespective of language. That would be a significant
step forwards, and would allow us to reassemble in its full multilingual complexity
the lexical landscape of medieval Europe. Digitization is thus, ironically for so
modern a process, a way to return to medieval reality, via the
Middle Ages of the title of the MARGOT conference.
glossary committee, founded in Oxford, under aegis of ANTS
L'équipe BFM 2011. Base de français médiéval.http://bfm.ens-lyon.fr/. Accessed January 25, 2012.
Beddow, Michael. 2007. L'Anglo-Norman on-line hub: une présentation technique. In Actes du XXIV e Congrès International de Linguistique et de Philologie Romanes, Aberystwyth. Ed. David Trotter. 1: 305-310. Tübingen: Niemeyer.
Corominas, Juan. 1954. Diccionario crítico etimológico de la lengua castellana Berne: Francke.
Kunstmann, Pierre and Achim Stein 2012. Nouveau Corpus d'Amsterdam. http://www.uni-stuttgart.de/lingrom/stein/corpus/. Accessed January 25, 2012.
Meyer, Paul and Lucy Toulmin Smith, eds. 1889. Nicole Bozon, Contes moralisés. Paris: SATF.
Monier-Williams, Sir Monier. 1899. Sanskrit-English dictionary. Oxford: Oxford University Press.
Rothwell, William. 2009. Walter de Bibbesworth: le Tretiz, Aberystwyth: Anglo-Norman Online Hub. http://www.anglo-norman.net/. Accessed January 25, 2012.
Trotter, David. 2000. L'avenir de la lexicographie anglo-normande: vers une refonte de l'Anglo-Norman Dictionary?. Revue de Linguistique romane 64:391-407.
Trotter, David. 2003. Langues en contact en Gascogne médiévale. In Actas del XXIII Congreso Internacional de Lingüística y Filología Románica, Salamanca, septiembre 2001. Ed. F. Sánchez Miret. 3, 479-486. Tübingen: Niemeyer.
Trotter, David. 2007. Habeas corpus ad testificandum: l'Anglo-Norman Dictionary et son corpus. In Le nouveau corpus d'Amsterdam. Actes de l'atelier de Lauterbad, 23-26 février 2006. Ed. Pierre Kunstmann and Achim Stein. 153-157. Stuttgart: Steiner.
Trotter, David and Andrew Rothwell. 2007. Présentation de l'AND. In Actes du XXIV e congrès international de linguistique et de philologie romanes, Aberystwyth. Ed. David Trotter. 2: 413-421. Tübingen: Niemeyer.
Twiss, T. 1871. Black book of the Admiralty. Rolls Series. 1. London: Longman et al.
Youngman, William. 1891. Sketch of the life and character of Alexander Cruden. In A complete concordance to the Old and New Testament. Ed. Alexander Cruden. xii-xv. London and New York: Warne.