objective categories for medieval scripts is an
ambitious endeavour. Indeed, a new challenge is presented by the application of
automated analysis and clustering of handwritings to the written production of the
Middle Ages. What is needed is a cross-pollination of the diverse approaches to the
creation of taxonomies in the humanities (medieval palaeography) on the one hand and
in computer science (image analysis) on the other, the comparison of results, and
attempting to understand why they are similar or different. It follows that newly
developed tools require new evaluation protocols having both a
truth proper to computer science (i.e. a proper objective, provable data),
and, at the same time, allowing for application of the rather more slippery
categories employed in the humanities. The clustering of scripts reflects the
subjectivity of the interpreter, and his or her often unstable categories of
interpretation; as a result, as with interpretation or attributions studies in art
history (Ginzburg 1979), all script categoies remain
subject to debate and discussion.
§ 2 The first attempts at automatically clustering scripts were first made by Arianna Ciula on the basis of some letter shapes (Ciula 2005), and by the Graphem research project (Grapheme based retrieval and analysis for palaeographic expertise of medieval manuscripts funded by the French National Agency for Research ANR-07-MDCO-006, 2007-2011) on the basis of automated image analysis without the need to select individual letters. In the latter project, several methods were created, developed, or improved in order to categorise medieval scripts automatically on the basis of 9800 images from the catalogues of dated manuscripts in France: these include Word Spotting, curvelets, generalised co-occurrence, stroke analysis and codebook (Cloppet et al. 2011; Daher et al. 2011; Joutel 2011; Lebourgeois and Moalla 2011; Leydier, Lebourgeois, and Emptoz 2007; Leydier 2009; Siddiqi and Vincent 2009; Siddiqi, Cloppet, and Vincent 2011). The research program ORIFLAMMS (Ontology research, image feature, letterform analysis on multilingual medieval scripts), which began in 2013 and will end in 2016, tackles the same issues with new methods that combine letterform analysis with Computer Vision for script classification.
§ 3 As part of the ORIFLAMMS research program, the present contribution addresses two related issues in categorising historical scripts: first, the creation of an adequate evaluation protocol to assess classification results of Computer Vision methods, and secondly, the epistemological notions of evidence in the Computer Sciences and the Humanities.
§ 4 The article is organised as follows: Section II addresses the issues of palaeography as a fundamental field of cultural history, in which there are classifications and criteria for analysing the scripts, but no consensus and too many overlaps (II.1), a consequence of which is that clustering is an ill-posed problem in Computer Vision (II.2); one possible avenue of research, analysis of the dynamic element of handwriting or 'ductus,' is shown to be beyond the scope of computer vision (II.3). Section III introduces a new database and collection of images which will serve as an evaluation tool for new algorithms. Section IV presents an evaluation of two different methods and their results kindly provided by Wolf and Lebourgeois. The concluding Section V suggests two different ways for computer vision and palaeography to contribute to one another.
The missing clusters: The history of script as a continuum
Palaeography as expertise: Internal factors used in the analysis and taxonomy of handwriting
Palaeography as a discipline is the history of handwriting, and,
as such, is a specific field within the study of history and in particular
within medieval studies. This discipline studies the appearance and
development of different types of script and their uses by diverse social
groups across time and in diverse documents (books, records, charters,
etc.). Further, it analyses the transmission and
of a written message, attending not only to the text but also to its form,
script, layout and support.
The date and place of production of written texts is a key
question, of importance not only for historians, but also for linguists and
philologists. Yet most books written in the Middle Ages bear no indication
of date or place of origin, and the script is an important clue to assess
the origin of the manuscript. Thus, palaeographers have both to assess when,
where, under which circumstances a text was written, and to balance
arguments based on internal and external characteristics. Scholars find
themselves caught in a paradox: on the one hand, establishing a chronology
of writing styles is absolutely crucial in order to study the
Zeitgeist or forma
mentis of any period of the written civilisation or to trace
evolutions and influences, and yet, conversely, this profound understanding
of the medieval society and of the scripts in use is very often the only way
to establish a putative chronology of the written testimonies. Comparisons
à la Panofsky have
been made regarding how Romanesque, Gothic, and late Gothic scripts are
related to the contemporaneous architecture styles (Marichal 2005; Panofsky 2005; Stiennon 1991). In the Gothic textualis script for example, the number of elementary shapes and
vocabulary of different strokes is reduced in comparison with previous
scripts, and each of these elementary shapes is used to build several
different letters, as if they were some architectonic elements in a
cathedral: for example, the letter
m is drawn by
juxtaposing three identical forms whereas in Caroline scripts, there were
often three different forms for the three minims. Likewise, some parallel
evolutions hint at a specific, time-bound forma mentis, as in the monumental Hebrew quadrata of the 13th century
and in contemporary Latin textualis. Such comparisons are very suggestive. Nevertheless,
they are not sufficient to delineate the boundaries between different
scripts and script styles, particularly in transition periods, or to explain
historical evolutions as such.
§ 7 An additional difficulty arises from the fact that, in the Middle Ages, several script families and script types are used contemporaneously, so that two handwritings of the same date need not be similar or belong to the same category. For instance, a Gothic textualis script dated 1470 CE resembles more closely a Gothic textualis of the late 13th century than a Humanist script of the late 15th century. The study of Latin script throughout the Middle Ages, from the 7th to the 15th century, requires the identification of script families, in order to be able to trace their uses and development across time and region, and their influences. This, in turn, necessitates the categorisation of different scripts, in spite of the debates among palaeographers about the classification(s) to be used, or about the implementation of such classifications (Derolez 2003; Smith 2004).
§ 8 It is a more challenging task to provide such a classification than to identify individual scribes. Indeed, the thousand years of the history of scripts in the Middle Ages exemplify the whole complexity of the human mind in the use of several scripts that all evolved in style and morphology simultaneously and with reciprocal influence on each other. Letter shapes appear, disappear and reappear; they evince recurring techniques as well as changing methods in the addressing of core issues in written communication (textual transmission, staging of the message, physical constraints, scribal ability, etc.).
§ 9 As a result, in order to answer such questions as the date and context of production, palaeographers analyse the script and its features in order to reconstruct a scientifically probable hypothesis, based on the knowledge about medieval written communication. With respect to both the task of identifying the date and that of determining the writing system, Computer Vision may assist palaeographers in categorising and dating handwritings, either by establishing new clustering criteria or by applying expert-systems.
Palaeographers partly agree on some key factors in their analysis
of script. In particular, they tend to agree on the seven aspects of a
medieval hand introduced by J. Mallon: forms, angle of writing, ductus,
module, weight, writing support, internal characteristics (Aussems and Brink 2009; Mallon 1952; Muzerelle 2013a). These
factors mainly concern the graphical aspect of handwriting – the
external characteristics – and are a
means to describe the script. The latter aspect of writing, the
internal characteristics, constitute a means to
interpret and analyse the handwriting in its social signification; for
example, the handwriting for a late antique emperor's rescript may largely
differ from a contemporaneous notary writing. Unfortunately, not only can
some of the external characteristics be interpreted diversely, e.g.
writing angle and
weight (density or
contrast) (Aussems and Brink 2009; Muzerelle 2013a), but also there is no
consensus on which aspect is predominant for script classification and what
makes two scripts identical or similar, when not all aspects are
§ 11 The most recent research tends to focus on Mallon's seven aspects and relies mostly on letter forms (Mooney, Horobin, and Stubbs 2012; Scragg et al. 2010; Stokes 2011), as is generally the case for late-antique and early-medieval book scripts. More complex approaches are based on the structure of the graphical chain, i.e. the sequence and combination of signs and characters, either through the distinctive use of allographs (Oeser 1971; Oeser 1994; Oeser 2001; Stutzmann 2010; Stutzmann 2013; Stutzmann 2014a), or through the analysis of connected characters (Casamassima 1988; Ceccherini 2007; Ceccherini 2008; Zamponi 1988).
Derolez's classification of Gothic book scripts
- textualis: two-compartment a; f and long s on the line; loopless ascenders (b, h, k, l);
- cursiva antiquior: two-compartment a; f and long s below the line; loops on ascenders (b, h, k, l);
- cursiva: single-compartment a; f and long s below the line; loops on ascenders (b, h, k, l);
- hybrida: single-compartment a; f and long s below the line; loopless ascenders (b, h, k, l);
- semitextualis: single-compartment a; f and long s on the line; loopless ascenders (b, h, k, l);
- semihybrida: single-compartment a; f and long s below the line; irregularly loopless ascenders (b, h, k, l).
§ 13 Along with letter forms, the second key concept of Derolez's taxonomy is formality. There are three levels of formality: currens, libraria and formata. Formality helps to understand and classify different scripts within a single family. It is a functional concept and crosses or subsumes several of Mallon's aspects. This concept is of great importance and if we expand its use to handwritings of the High Middle Ages, we might consider that it is one underlying factor in the differences in morphology between Caroline minuscule and Caroline 'Glossenschrift,' as distinguished by B. Bischoff (Bischoff 1954), or between textualis and semitextualis (e.g. for the littera [textualis] parisiensis and the currens form tending to a semitextualis) (Stutzmann 2005). As a consequence it has been integrated by scholars in discussions of the history of scripts (Smith 2011).
Difficulties in script classification: Two examples
§ 14 Derolez's method of classification is the only classification system to cover the great diversity of Gothic handwriting. Nevertheless, because it is mainly based on morphological differences, the system has been criticised for instances in which categories overlap or where distinctions between categories are less than clear. Overlaps and exceptions would appear in any classification system because classification systems of their nature serve to divide an historical continuum into discrete sections, and thereby create sometimes arbitrary separations within what is, in fact, a continuous progression. In particular, the difficulty in asserting a particular level of formality has been criticized, because no strict criteria are provided for assessing formality. As a result, this concept is very difficult to use both for script analysis in palaeography (Smith 2004; Stutzmann 2005), and for computational tools.
§ 15 Two examples of seemingly misleading classification using this taxonomy can be cited to illustrate the problem.
§ 16 On the one hand, let us compare Chaumont, Bibliothèque municipale, 37, f. 1r (see Figure 1) with Paris, Bibliothèque nationale de France, latin 10677, f. 121r (see Figure 2). The former is written in littera textualis, with two-compartment a, and f and long s on the line. The latter is written in a very neat littera hybrida with one-compartment a as well as f and long s descending below the line. Yet both are close in date of production (the manuscripts are datable around 1452 and 1480-81 respectively), and in style (esp. with respect to the verticality of the script and accessory elements, like additional decorative, thin pen strokes). The very notion of style is difficult to assess in an objective manner, and yet the common general impression is one of thickness and uprightness. Here, a too neat distinction between textualis and hybrida may seem misleading, even if the formal criteria distinguish both scripts very clearly.
§ 17 In contrast, sometimes associations may seem inaccurate. If we compare, for example, e.g. Bordeaux, Bibliothèque municipale, 404, f. 146v (see Figure 3) and Paris, Bibliothèque Interuniversitaire de la Sorbonne, 1033, f. 14r (see Figure 4), we may observe that both handwritings, produced within 25 years of each other (respectively 1423 and 1448), show the same structure: littera textualis for the beginning of paragraphs and littera cursiva for the rest of the text. In this case, both manuscripts not only show the a, l, f shapes which define the scripts, but also other common allographs (d, r). The main difference consists in formality, as it is reflected in ductus, form reductions (e.g. in letter g), and velocity. Nevertheless, we might argue that distinguishing between cursiva libraria and cursiva currens is not a sufficient expression of the great visual difference between the scripts.
§ 18 These two cases demonstrate that Derolez's classification system segregates handwritings from one another, or groups them together, when, at first sight, they would have been considered very similar or very dissimilar. Nevertheless, it must be stressed that, on the one hand, there is no other classification covering the complete range of Gothic scripts, and that, on the other hand, exceptions would probably appear in any other classification system.
As already mentioned, the only consensus existing among
palaeographers concerns the unity of Latin scripts as a whole, and the
canonised scripts. The lack of a
system encompassing all medieval scripts that would allow their in-depth
study is a vicious circle. As a consequence of reciprocal influences and
progressive evolutions and resurrections every representation of script
history can be confronted with contradicting examples.
§ 20 We propose using consistently one of the taxonomies elaborated by the palaeographers, despite unavoidable shortcomings, and basing image analysis research on it: that is, to perform machine learning on an annotated dataset, and then to consider the results given by the machine for all ambiguous scripts. This process can then be repeated with other taxonomies. In this manner, we will be able to trace the strength and usefulness of these taxonomies for script history, as well as detect moments or forms that provide a less clear picture and thereby indicate those periods that require further research. Instances of overlap and occurrences across categories will then suggest fruitful areas for research. By means of a more clearly defined subject palaeographers can identify and discuss the relative importance of criteria for script analysis and the areas in which they apply. For this reason, we suggest using extensively one classification system in order to identify overlaps, and indeed, in order to gain insight into what is specific to a script type beyond the mere morphology of the letters.
Clustering of script as an
§ 21 The implementation of computational methods to cluster scripts from the one-thousand-year history of medieval written production presents a difficult challenge. Most of the research in image analysis developed for or applied to medieval scripts addresses the question of scribal identification within a homogenous corpus: either the writings of one particular scribe or manuscript, or the production in a particular chancery or limited time-frame, e.g.: studies on Christine de Pisan (Aussems 2007; Aussems and Brink 2009), Clara Hätzlerin; MS. Heidelberg, Universitätsbibliothek, Cpg 329 (Hofmeister, Hofmeister-Winter, and Thallinger 2009); the counts of Holland (Smit 2010); England in the first third of the 11th c. (Stokes 2014).
§ 22 If the issue of the clustering of scripts is to be tackled, the afore-mentioned difficulties must be addressed. Not only is it impossible to attribute a specific date or place of production to most of the extant material, but also there is no consensus on how to analyse medieval and especially late-medieval scripts. As a consequence, proofs for new hypotheses and the body of evidence largely depend on the premises of the palaeographical interpreter.
§ 23 Moreover, the historical evolution of scripts is a continuum and has to be divided into discrete categories. How this is executed largely depends on the chosen criteria for analysis. Additional, external criteria, such as the evaluation of how a script relates to other formal and artistic elements or to other non-Latin scripts, could also be employed in the classification of scripts. It is, however, impossible to address and explore these criteria by means of Computer Vision and image analysis.
In this context, performing automated clustering, or similarity
ranking on medieval scripts is not only complex task, it is above, as often in
Computer Vision, an
ill-posed problem according to the
definition of Jacques Hadamard (Bertero, Poggio, and
Torre 1988; Hadamard 1902), which is to say
that it is a problem that has no solution or, at least, no acknowledged
solution. If there were a solution to this problem, it would not be a single
solution, because it does not change continuously in accordance with the initial
conditions (which were fixed at the production stage in terms of time and
place), but according to the weight given to each criterion.
§ 25 When applied to Computer Vision, this epistemological problem is rendered even more complex, because of the lack of representative datasets. The actual datasets for the Latin scripts used in computer science such as the Saint Gall Database (FKI Research Group on Computer Vision and Artificial Intelligence 2012) do not represent the wide variety of scripts that would allow us to address the issue of categorising and clustering the total written production of the Western Middle Ages, which can be estimated at c. 70 000 medieval manuscripts in France and around 800 000 worldwide, excluding archival material.
§ 26 To get a more accurate idea of the great variety of scripts throughout the Middle Ages and an overview of their diversity, there is only one resource: the catalogues of dated and datable manuscripts. These publications are indeed the sole resource that records metadata about thousands of medieval manuscripts and provides almost systematically a photographic reproduction of each. The endeavour began in 1953 and catalogues have been published in Austria, Belgium, France, Germany, Italy, Sweden, Switzerland, and in the Netherlands, United Kingdom, and the Vatican City. The Manuscrits datés project aims to identify every codex for which scholars have ascertained information regarding the date and place of production; the most recent volume was published in 2013 (Comité international de paléographie latine and Muzerelle 2012; Muzerelle 2013b). Only the French collection has an integrated cumulative index at its disposal, which we can use for this inquiry. There are around 5000 dated manuscripts identified in French libraries (plus more than 2000 manuscripts of certain origin, and no fewer than 3200 manuscripts mentioned in order to reject misleading indications written by the scribes or owners, e.g.: colophons copied by scribes with the date of the original and not of the copy; false attributions to famous authors and previous owners). These dated manuscripts provide us with the core dataset to investigate the history of script, as has been shown by Derolez in his study of Gothic book scripts (Derolez 2003). As for script classification, the great collection of reproductions ensures that all types from all periods and places are represented.
Provisional disclaimer: Computer Vision and the dynamics of script
§ 27 The history of script consists of continual transformations and influences, the resurrection of long extinct scripts, and the gradual individuation of contemporaneous ones. As such, the clustering of medieval scripts is largely an ill-posed problem. Palaeographers have specified the features to be analysed, but some of these remain ambiguous. One key feature in the analysis of script is ductus: the fact that the very essence of handwriting itself is a movement before being a trace must be stressed. Analysis of ductus could suggest a means of overcoming some of the difficulties in script classification, but this very important aspect of script remains largely unexplored and scarcely within the reach of Computer Vision (Stutzmann 2014b), even if the codebook approach in the Graphem project was an attempt to uncover movement-related information through the subdivision of each letter into its fundamental strokes (Daher et al. 2011; Herzog, Neumann, and Solth 2011).
§ 28 Computer Vision cannot address the dynamical component of the script and cannot rely on a ground-truth, in the sense that the ground-truth would reflect historical reality. Regarding the history of scripts, the only ground-truth that the humanities can yet supply is a corpus annotated according to at least one of the extant classification systems. Such a corpus is described in the following section.
Towards a new tool for evaluating results: The French Scripts Database
A new database is proposed in order to evaluate newly developed tools and
explore the history of scripts. It is based on Muzerelle's index of dated
manuscripts (Muzerelle 2006) and consists of images,
categories, and metadata. While the associated categories and metadata constitute a
ground truth in the sense of computer science, the database
is more of a heuristic instrument, which cannot pretend to be the adequate
representation of the historical reality. In the following section, we discuss the
extent of the image collection (III.1), previous work
done on this image collection (III.2), and how its
nature has been changed by adding metadata as a ground-truth (III.3).
§ 30 The image collection used in the evaluation protocol is based on the collection of 9800 images from the French catalogues of dated and datable manuscripts (Catalogues des manuscrits datés). Assembled during research and constituting part of it primarily as a (visual) documentation file, the collection of images far exceeds the published volumes of plates in number. Not only does it often provide more than one image for a single manuscript, but it also covers manuscripts for which there are no plates in the printed catalogues (2600 of the 5240 photographs representing dated and datable manuscripts remain unpublished).
§ 31 The extent of the image collection and its technical characteristics are, however, not entirely ideal. First, the collection does not exactly correspond to the printed catalogues. This is not only the result of adding useful material: on the one hand, the collection lacks the photographs which were used by the publisher to print the plates (769 of 3469, i.e. 22 %, of the printed plates do not have a digitised version), and, on the other hand, numerous plates reproduce manuscripts which were not retained in the catalogues at all (3518 photographs), or were only included with brief descriptions (1105 photographs).
§ 32 Secondly, the images have not been chosen with Computer Vision applications in mind, but rather in order to document the (complex) medieval reality of script development. Each manuscript is only presented with 10 x 15 cm photographs, so that each image only covers part of the page (except for manuscripts of small dimensions). Moreover, the dataset is not strictly homogeneous, given the great variety of layouts and the presence of rubrics, pen-work flourishes and illuminated initials. For the purposes of handwriting analysis and script classification, difficulties arise from the internal diversity of the medieval artefacts: several handwritings and script types, or handwritings of different sizes may appear on the same page; the same scribe may use a different script or script size to highlight some parts, or later scribes may write additions and comments; lines are skewed; the ruling is often visible and intersects with the writing; lines of text overlap (ascenders and descenders appear in the same interlinear space), as do interlinear glosses or later corrections and additions. In addition, some of the older photographs are of poor quality; however, as the pictures were taken on a large scale, this does not present a major technical or statistical problem.
Thirdly, the representativeness of the set of images should be
increased and enhanced at a later stage. This requires thorough analysis
beforehand. At present, the collection presents a statistical superabundance of
images from later periods. There is, however, no straightforward solution to
this problem, inasmuch as any modification will itself be determined by some
prior assumption regarding what should be represented. Indeed one may define the
target either from a mere statistical point of view, with classes comprised of
same number of samples (e.g. diachronic classes of 50 years with the same number
of samples in each class), or from an palaeographical point of view, creating
classes according to an existing classification system, or from an historical
one in building diachronic/geographic classes to scale and to represent book
production. This diachronic approach proposes to use dated manuscripts and thus
to include more images from single manuscripts, which would in turn reduce the
variability in the classes generated with this method. Moreover, even if there
is no specific
vernacular palaeography (Careri et al. 2001; Careri, Ruby, and Short
2011), the dataset should also encompass more vernacular texts.
§ 34 The present image collection, despite its imperfections, is the best available to date. It has already been used in the ANR Graphem research program.
Previous work on the image collection
§ 35 The ANR Graphem project (ANR-07-MDCO-006), conducted by an interdisciplinary research consortium from 2008 to 2011, aimed to produce automated classifications of scripts, by enhancing data visualisation and developing content-based image retrieval software (Cloppet et al. 2011; Daher et al. 2011; Joutel 2011; Lebourgeois and Moalla 2011; Leydier 2009; Muzerelle 2009; Siddiqi, Cloppet, and Vincent 2011).
§ 36 Precision and recall was evaluated separately for each content-based image retrieval tool, using a different method each time, according to the sample of images on which each method had been applied.
§ 37 Among the methods used, were the following:
- 3D projection and display of photographs onto a graph in order to evaluate visually whether handwritings in a same zone are similar (Moalla)
- identification ex ante of the 5 best matches for one specimen in a script category and evaluation of their ranking as a result of image retrieval; the software used six different features, as a result the test had to vary according to the weight of each feature, and according to 6 different types of script (Siddiqi)
- enumerating how many scripts matched in date and script type among the eight nearest retrieved matching neighbours (Moalla, Lebourgeois, Joutel)
§ 38 Besides diverse evaluation methods, the project also used a varied sample of data, ranging from a sample of 300 images to the complete set of 9800 images. In addition, the sample included the same 300 images after clean-up and removal of initials (Gurrado et al. 2011).
As we undertook this evaluation, we discovered first-hand that the
results of several methods cannot be compared when the evaluation protocol is
not unique. Moreover, the implemented evaluation protocols are highly inaccurate
and it is impossible to automate their application in any way. Thus, on the one
hand, they are heterogeneous, unscalable, and not automated, and, on the other
hand, they do not really address the desire for human-driven interpretation. For
example, while a field was provided to record whether an image was considered
poor quality or not historically representative, there was no way to evaluate
the impact of such characteristics in the retrieval of nearest matching
neighbours. The validation of this retrieval was based on statistics and
moderated by means of the human
palaeographical eye, whose
adjudications were recorded as comments. Yet, the program offered no possibility
to pursue the causes and reasons of mixed results in greater depth.
§ 40 In order to overcome these shortcomings, we integrated the human-driven classification and interpretation of the images in a database.
Available metadata in the present database
§ 41 In response to the shortcomings discussed above, metadata provided in the database was enhanced, and, at present, the fields are as follows:
- bibliographic reference to the catalogue of dated manuscripts
- link to digitised image
- quality of image (technically and for representativeness)
- script type(s): one or more of the classification terms, such as uncial, semi-uncial, caroline, praegothica, textualis, rotunda, cursiva, cursiva antiquior, hybrida, semitextualis, humanistic, cursive humanistic)
- formality (currens, currens-libraria, libraria, libraria-formata, formata)
- comments, esp. simultaneous presence of different script types
The database records letter forms only in terms of
script type(s) and ignores features such as angularity, regularity,
slant, density, angle of writing, ductus, module, weight, or writing support.
Such information could be added into the database later as averaged metrics.
This information can be introduced only at a later stage for the following
reasons: first, measuring these features manually is a very time-consuming task
(Gurrado 2009), and we would argue that part of
the measuring process should be automated (height of minims, ruling unit, length
of ascenders and descenders, length of upper parts of d or t); secondly, with
respect to classification, there is no consensus on how to measure these
features and there are different definitions among palaeographers as well as a
great divide between computer scientists and palaeographers on some terms (Stutzmann and Tarte 2014). Only automated
measurements and the integration of further features in a unified manner for all
images will allow for refined classification of script.
Nevertheless, as they are, the image collection and associated
database provide an integrated tool to evaluate the results of Computer Vision
and to check a specific classification, chronology, and description of degree of
formality. In the future, other classifications and additional features may be
added and be compared. The
ground-truth provided by
palaeographers is one way of interpreting data and need not be the only one. On
the contrary, automated methods will not only have to reproduce one particular
classification, but also serve as a tool to ascertain the inner coherence of
each classification and its graphical underpinnings or overlaps.
§ 44 In their current state, the 9800-image collection and the database containing the metadata have been used to visually evaluate two different experiments. In both cases, unsupervised image analysis was performed on the complete set of images using N different features for each experiment. The results of this analysis are stored in CSV files containing 9800 lines (one per image) and N columns (one per feature) containing numerical figures normalized to 0- 1 range (each figure is between 0 and 1). The results are visualised through Principal Component Analysis and 3D-projection using the Explorer 3D visualisation software developed by Matthieu Exbrayat and Lionel Martin (Exbrayat and Martin 2004).
§ 45 Since the Explorer 3D software allows us to link the analysed data to metadata, one may display in different colours the dots representing the analysed images according to specific criteria, such as date, place, type of script, formality etc. In particular, in the Project we decided to visualise the distributions according to:
- date for the complete corpus
- script type: Caroline (c. 9th-10th cent.), post-Caroline (c. 11th cent.), praegothica (c. 12th cent.), humanistic scripts (c. 15th-16th cent.)
- script type: praegothica, textualis, rotunda, cursiva, hybrida, and humanistic
§ 46 Even if the image analysis encompasses all the photographs, only the images with sufficiently accurate metadata are displayed, in order to keep the visualisation relatively simple and legible.
§ 47 In section IV.1 below, we describe the first experiment performed by Lior Wolf, of the Blavatnik School of Computer Science, Tel Aviv University (http://www.cs.tau.ac.il/~wolf/). Wolf had absolutely no previous knowledge of the dataset before applying tools that had been mainly developed for the Cairo Genizah. There were 500 features for each image.
§ 48 The second experiment is described in section IV.2. It was performed by Frank Lebourgeois of the LIRIS Computer Science lab, INSA in Lyon (http://liris.cnrs.fr/membres/?id=834). Lebourgeois provided the raw results of the analysis performed as part of the ANR Graphem research program. There were 1024 features for each image.
Experiment 1 (Wolf)
§ 49 In Wolf's experiment, the general characteristics of handwritings can be correctly identified. With respect to the date, for instance, the projection in Figure 5 shows a relatively homogeneous group for the manuscripts from the 6th to the 11th cent., a mixed group for the 12th to the 14th cent., interspersed with the scripts from the 15th and 16th cent., which are scattered across a wide field, but are mainly clustered in three groups, which prove to be historically coherent: textualis formata / textualis currens / humanistic script.
§ 50 Indeed, if, as in Figure 6, we use the same data and project the script type for Caroline (c. 9th-10th cent.), late-Caroline (c. 11th cent.), praegothica (c. 12th cent.), and humanistic scripts (c. 15th-16th cent.), we notice that the humanistic scripts are neatly located between late-Caroline scripts and praegothica scripts. This result is both historically coherent and significant, since humanistic scripts were created in the 14th century in imitation of 11th- and 12th-century manuscripts.
§ 51 Moreover, the analysis of textualis, textualis meridionalis (rotunda), cursiva and hybrida scripts in Figure 7 reveals evidence of two crucial phenomena. On the one hand, textualis and textualis meridionalis (rotunda) are clearly separated from one another and from cursiva and hybrida; on the other hand, cursiva and hybrida completely overlap with one another. In this regard, the late medieval scripts seem to be organised around three poles, viz., textualis, textualis meridionalis, and cursiva+hybrida rather than textualis, cursiva, and hybrida. This result, based on mere morphological considerations, reflects adequately received palaeographical understandings, since hybrida scripts are based on cursiva, and their separation has been debated. It is however most unexpected to see this near fusion of cursiva and hybrida, for we have underlined above that many hybrida scripts are very close to some textualis formata scripts.
§ 52 If we add, as in Figure 8, the praegothica and the humanistic scripts to this picture, another fascinating configuration appears: praegothica scripts are located at the border between textualis and textualis meridionalis, as one would expect from their historical position, and humanistic handwritings are in a distinct field of their own, obviously close to praegothica.
§ 53 A final observation should be made regarding formality, which appears to be a major criterion. If we observe all the tagged scripts, whatever the script category, formal (formata) scripts and informal (currens) scripts, are at the two ends of the large crescent (Figure 9).
§ 54 More in-depth analysis is needed to observe how various criteria correlate with each other and what the specific features of the images in the overlapping zones are. Nevertheless, some conclusions can already be drawn: Computer Vision algorithms reflect each category of the formal palaeographical analysis (script type, date, formality) and every script type of the late Middle Ages except for cursiva and hybrida scripts, which are merged in a single category. Moreover, the relative positions of each group of scripts is in agreement with the history of scripts: humanistic handwritings are positioned between late-Caroline and praegothica; hybrida and cursiva appear together; textualis meridionalis is located between praegothica and textualis).
Experiment 2 (Lebourgeois)
§ 55 In the second experiment, which was run with Moalla & Lebourgeois's algorithms, a major discrepancy emerges. As noted above, image analysis was performed on 9800 script samples, and a CSV-file with 1024 numeric figures for each image was produced (Lebourgeois and Moalla 2011). Using the same method of evaluation as described above, the PCA projection in a 3D-space results in three clearly distinct groups. However, if we try to identify matching criteria according to our knowledge about the history of scripts (date, place, imitations, revivals, evolutions) or with respect to our perception (formality), none of them appears to be clearly reflected in the resulting chart (Figure 10, Figure 11). Formality is the sole feature which can be tentatively linked to the groups since formata scripts are dispatched in two of the three groups and seem to be separated from currens scripts through axes X1 and X2 (Figure 12).
Comparison and interpretation of results
§ 56 The results of both experiments are strikingly divergent. In the first experiment based on Wolf's algorithm, the palaeographer can identify patterns and evolutions that look familiar. As a consequence, these automated processes prove to be an efficient tool with which to pre-categorise and date handwritings. We plan to analyse more in-depth the results and cross the categories in order to ascertain accuracy, to measure the precision of the results, to explain the overlaps, and to identify in which conditions or on which scripts the algorithms perform best. In the second experiment, it is as yet unclear why scripts are scattered in the 3D-space through the Principal Component Analysis.
§ 57 Several factors are involved and make interpretation difficult. The disconnection between numerical values and graphical characteristics is the very first obstacle. Indeed, among the numerous features used in Computer Vision, many seem to be applied without a clear understanding of their graphical meaning. If Zernike polynomials and Haralick's textural features are evidently defined according to a graphical substance (Haralick, Shanmugam, and Dinstein 1973), their application to scripts remains to be interpreted. A number of questions have yet to be resolved. For example: Is script a texture? What link can be established between direction (of contour) and slant? What is the consequence of rotation invariance (e.g. are letters p and d one and the same)? Silhouette or skeleton? Local or global analysis? What is the signification of script types and how do they correlate between each other?
§ 58 Moreover, in a N-dimensional analysis, with N>500, even if the computed features initially convey a graphical meaning, it remains highly challenging to explain historically or to interpret palaeographically the combination of several features to discern an axis in the projection. In addition, the scalability or expansion of these analyses and the identification or categorisation of as-yet-unidentified handwritings is a question of far-from-trivial importance.
The first experiment reveals that an
system is able to emulate human expertise. The second experiment may be
based on underlying formal criteria that have not yet been identified by
palaeographers. At the same time, the latter experiment recalls another
methodological difficulty: in this experiment, the calculated features were
produced without explanation of the metrics to which they refer. In particular,
it is not clear how these metrics relate to a graphic reality (what do the axes
mean? do they correspond to the presence of round forms or to the regular
repetition of similar shapes? etc.).
Conclusion: Searching for expert systems or exploring new approaches?
The difference between the two experiments should not, therefore, be
interpreted only as success and failure. Indeed, the challenge of automated
classification of medieval scripts is an ill-posed problem. The notions of success
and accuracy, however, largely depend on the expectations which are brought into the
experiment through the idea of
ground-truth and on the
interpretation of the results. As a result of the research conducted here, we can
draw some firm conclusions and open a new perspective.
§ 61 The multiple and parallel evolutions of scripts during the medieval millennium hamper automated classification and clustering. Moreover, the lack of consensus among palaeographers regarding criteria for analysis makes automated classification, per se, an ill-posed problem. To change the conditions of such research, we have built a database containing 9800 script samples dating from the 6th century to 1600 A.D. with corresponding metadata (date, place, script type, formality) on the basis of the French catalogues of dated and datable manuscripts. The database integrates one possible classification as ground-truth, and has two aims. First, it should serve to measure accuracy and recall in an evaluation protocol for Computer Vision and image analysis applied to historical handwritings. Secondly, on a more theoretical level, it clarifies the roles, scientific responsibilities and underpinnings of the researchers from different fields, and will facilitate dialogue between them.
§ 62 In the first instance, the notion of classification of scripts is a matter that concerns palaeographers. Thereafter, two different paths may be taken. In both, the methods, scope, and targets of analysis must be carefully considered and described, since the very ground truth and proof systems differ from one field of science to another. On the one hand, palaeographers have the task of assessing a stable ground truth ex ante, and computer scientists may develop an expert system which will emulate human expertise. According to this hypothesis, the developed system may be an opaque black-box of which the decisive features are unknown. Despite its opacity, the expert system will then be useful for classifying scripts on a large scale basis (for example in large digital libraries) and also to spot inaccuracies or discrepancies within a given classification. On the other hand, Computer Vision uses its own acknowledged features, and palaeographers attempt to give an interpretation to the results ex post. This latter path has been taken in the research done for the GRAPHEM research project both by Muzerelle and myself in order to interpret the semi-supervised clustering from either a formal or a historical point of view (Muzerelle 2009; Stutzmann 2014a).
This latter method may be fruitful, but is unsatisfactory from a
scientific point of view, since it is impossible to assert the relevance of the
results. As a consequence, we suggest improving this method through a dialogue and
preliminary experiments in order to ascertain which formal criteria will be analysed
(e.g. regularity, roundness, angularity) through which features. Formal criteria can
be defined freely outside the scope of what is at present acknowledged by
palaeographers, but they must be clearly defined. As was suggested during the
Dagstuhl workshop 'Computation and Palaeography: Potentials and Limits' (Hassner et al. 2013), such definitions may be made in
the form of
mid-level features (a combination of features which
would correspond to an identifiable graphical phenomenon such as
one-compartment a), or in a form yet to be determined. If the features and formal
criteria are defined beforehand, the results will serve to support the
identification of distinguishing features. Palaeographers will be in turn empowered
and able to weigh criteria and state how they affect classification. Image analysis
is, then, an exploratory endeavour, and cannot rely upon any ground truth. Any
interpretation of the results would have to be elaborated in interdisciplinary
discussion in order to give a graphical meaning to the mathematical figures (Stutzmann and Tarte 2014).
A preliminary version of this paper was presented at the International workshop on pattern recognition and historical document analysis = 20th international SAOT workshop (Erlangen, 14-15 June 2013).
The research is partly funded by the Graphem research project (Grapheme based retrieval and analysis for palaeograpHic expertise of medieval manuscripts, ANR-07-MDCO-006, 2007-2011, Agence Nationale de la Recherche) and the ORIFLAMMS research project (Ontology research, Image Feature, Letterform Analysis on Multilingual Medieval Scripts, ANR-12-CORP-0010, 2013-2016, Agence Nationale de la Recherche / Cap Digital).
This paper is largely based on discussions especially with Ségolène Tarte and Lior Wolf. We wish to thank Lior Wolf who kindly analysed the corpus presented below and gave permission to present the results, as well as my colleagues Frank Lebourgeois and Katja Monier.
Aussems, Mark. 2007. Christine de Pizan and the scribal fingerprint: A quantitative approach to manuscript studies. Utrecht: Universiteit Utrecht. http://igitur-archive.library.uu.nl/student-theses/2006-0908-200407/UUindex.html.
Aussems, Mark, and Axel Brink. 2009. Digital palaeography. In Kodikologie und Paläographie im digitalen Zeitalter - Codicology and palaeography in the digital age, edited by Malte Rehbein, Patrick Sahle, and Torsten Schaßan, 293–308. Schriften des Instituts für Dokumentologie und Editorik 2. Norderstedt: BoD.
Bischoff, Bernhard. 1954. La nomenclature des écritures livresques du IXe au XIIIe siècle. In Nomenclatures des écritures livresques du IXe au XVIe siècle : Premier colloque international de paléographie latine, Paris, 28-30 Avril 1953, edited by Bernhard Bischoff, Gerard Isaac Lieftinck, and G. Battelli, 7–14. Colloques Internationaux du C.N.R.S. - Sciences Humaines 4. Paris: Édition du C.N.R.S.
Careri, Maria, Françoise Fery-Hue, Françoise Gasparri, Geneviève Hasenohr, Gillette Labory, Sylvie Lefèvre, Anne-Françoise Leurquin, and Christine Ruby. 2001. Album de manuscrits français du XIIIe siècle. Mise en page et mise en texte. Roma: Viella.
Ceccherini, Irene. 2007. Tradition cursive et style dans l'écriture des notaires florentins (v. 1250-v. 1350). Bibliothèque de l'École des Chartes 165.1, Écritures latines du Moyen Âge, tradition, imitation, invention: 167–85.
───. 2008. La genesi della scrittura mercantesca. In Régionalisme et internationalisme. Problèmes de paléographie et de codicologie du Moyen Âge. Actes du XVe colloque du Comité international de paléographie latine [Vienne, 13–17 Septembre 2005], edited by Otto Kresten and Franz Lackner, 123–37. Denkschriften / Österreichische Akademie der Wissenschaften, philosophisch-historische Klasse 364. Wien: Verlag der Österreichischen Akademie der Wissenschaften.
Ciula, Arianna. 2005. Digital palaeography: Using the digital representation of medieval script to support palaeographic analysis. Digital Medievalist 1. http://www.digitalmedievalist.org/journal/1.1/ciula/.
Cloppet, Florence, Hani Daher, Véronique Églin, Hubert Emptoz, Matthieu Exbrayat, Guillaume Joutel, Frank Lebourgeois, Lionel Martin, Ikram Moalla, Imran Siddiqi, and Nicole Vincent. 2011. New tools for exploring, analysing and categorising medieval scripts. Digital Medievalist 7. http://www.digitalmedievalist.org/journal/7/cloppet/.
Comité international de paléographie latine, and Denis Muzerelle. 2012. Manuscrits datés: État des publications. Palaeographia. Accessed May 12. http://www.palaeographia.org/cipl/cmd.htm.
Daher, Hani, Véronique Églin, Stéphane Brès, and Nicole Vincent. 2011. Étude de la dynamique des écritures médiévales. Analyse et classification des formes écrites. Gazette du Livre Médiéval 56-57: 21–41.
Derolez, Albert. 2003. The palaeography of Gothic manuscript books from the twelfth to the early sixteenth century. Cambridge Studies in Palaeography and Codicology 9. Cambridge: Cambridge University Press.
Exbrayat, Matthieu, and Lionel Martin. 2004. Explorer3D. http://www.univ-orleans.fr/lifo/software/Explorer3D/.
FKI Research Group on Computer Vision and Artificial Intelligence. 2012. Saint Gall Database — Computer Vision and Artificial Intelligence. IAM Institut für Informatik und angewandte Mathematik. http://www.iam.unibe.ch/fki/databases/iam-historical-document-database/saint-gall-database.
Ginzburg, Carlo. 1979. Spie. Radici di un paradigma indiziario. In Crisi della ragione. Nuovi modelli nel rapporto tra sapere e attività umane, edited by Aldo Gargani, 59–106. Einaudi Paperbacks 106. Torino: Einaudi. http://www.associazionetolba.org/socialspread/images/materialeinformativo/Losi/spie_note_pag.pdf.
Gurrado, Maria. 2009. Graphoskop: Uno strumento informatico per l'analisi paleografica quantitativa. In Kodikologie und Paläographie im digitalen Zeitalter - Codicology and palaeography in the digital age, edited by Malte Rehbein, Patrick Sahle, and Torsten Schaßan, 251–59. Schriften des Instituts für Dokumentologie und Editorik 2. Norderstedt: BoD. http://kups.ub.uni-koeln.de/2973/.
Gurrado, Maria, Denis Muzerelle, Marc H Smith, and Dominique Stutzmann. 2011. Projet ANR Graphem. Livrable N° 28. Évaluation finale de la plateforme et des outils en ligne. Analyse critique des Résultats. [internal report]
Hassner, Tal, Malte Rehbein, Peter A. Stokes, and Lior Wolf. 2012. Schloss Dagstuhl?: Seminar homepage. 18.–21. September 2012, Dagstuhl perspectives workshop 12382. Computation and palaeography: Potentials and limits. Schloss Dagstuhl - Leibniz-Zentrum Für Informatik. http://www.dagstuhl.de/12382.
───. 2013. Computation and palaeography: Potentials and limits. Dagstuhl Manifestos 2: 14–35. doi:http://dx.doi.org/doi:10.4230/DagMan.2.1.14.
Hassner, Tal, Dominique Stutzmann, Ségolène Tarte, and Robert Sablatnig. 2014. Schloss Dagstuhl?: Seminar homepage. July 20–24 , 2014, Dagstuhl seminar 14302. Digital palaeography: New machines and old texts. Schloss Dagstuhl-Leibniz-Zentrum Für Informatik. http://www.dagstuhl.de/14302.
Hofmeister, Wernfried, Andrea Hofmeister-Winter, and Georg Thallinger. 2009. Forschung am Rande des paläographischen Zweifels: Die EDV-basierte Erfassung individueller Schriftzüge im Projekt DAmalS. In Kodikologie und Paläographie im digitalen Zeitalter - Codicology and palaeography in the digital age, edited by Malte Rehbein, Patrick Sahle, and Torsten Schaßan, 261–92. Schriften des Instituts für Dokumentologie und Editorik 2. Norderstedt: BoD. http://kups.ub.uni-koeln.de/volltexte/2009/2974/; http://www.i-d-e.de/wordpress/wp-content/uploads/2009/08/hofmeister_thallinger.pdf.
Leydier, Yann. 2009. Ulysse 0.5 — Moteur de recherche de mots en mode image | Graphem. Graphem?: Projet ANR. http://liris.cnrs.fr/graphem/?p=171.
Mooney, Linne, Simon Horobin, and Estelle Stubbs. 2012. Late medieval English scribes. York: Centre for Medieval Studies at the University of York, University of Oxford. http://www.medievalscribes.com/index.php?nav=off.
Muzerelle, Denis. 2006. Manuscrits datés (France) - Index général interactif. Aedilis. http://aedilis.irht.cnrs.fr/cmdf/.
───. 2009. Graphem for Dummies. http://www.uib.no/filearchive/graphemfords.pdf.
───. 1994. Beobachtungen zur Strukturierung und Variantenbildung der Textura. Ein Beitrag zur Paläographie des Hoch- und Spätmittelalters. Archiv Für Diplomatik, Schriftgeschichte, Siegel- und Wappenkunde 40: 359–439.
Siddiqi, Imran, and Nicole Vincent. 2009. Combining contour based orientation and curvature features for writer recognition. In Computer analysis of images and patterns, edited by Xiaoyi Jiang and Nicolai Petkov, 245–52. Lecture notes in Computer Science 5702. Springer Berlin Heidelberg. http://link.springer.com/chapter/10.1007/978-3-642-03767-2_30.
Smit, Jinna. 2010. Meten is weten? De toepassing van het Groningen Intelligent Writer Identification System (GIWIS) Op Hollandse kanselarij-oorkonden, 1299-1345. Bulletin de la Commission royale d’histoire 176.2, Chancelleries princières et scriptoria dans les anciens Pays-Bas Xe-XVe siècles / Vorstelijke kanselarijen en scriptoria in de Lage Landen, 10de-15de eeuw: 343–60.
Smith, Marc H. 2004. [review] Derolez (Albert). The palaeography of Gothic manuscript books. From the twelfth to the early sixteenth century. Cambridge: Cambridge University Press, 2003. Scriptorium 58: 274–79.
───. 2011. Les formes de l'alphabet latin, entre écriture et lecture. Paper presented at the Colloque de rentrée 2011?: La vie des formes, Collège de France (Paris), Paris, France, October 14. http://www.college-de-france.fr/site/colloque-2011/symposium-2011-10-14-10h45.htm.
Stokes, Peter A. 2011. Describing handwriting, Part I. DigiPal - Digital resource for palaeography, manuscripts and diplomatic. July 27. http://digipal.eu/blogs/blog/describing-handwriting-part-i/.
Stutzmann, Dominique. 2005. Nomenklatur der gotischen Buchschriften: Nennen? Systematisieren? Wie und Wozu? (Rezension über: Albert Derolez: The palaeography of Gothic manuscript books. from the twelfth to the early sixteenth century. Cambridge U.a.: Cambridge University Press 2003.). IASLonline. http://www.iaslonline.de/index.php?vorgang_id=995.
───. 2010. Paléographie statistique pour décrire, identifier, dater... Normaliser pour coopérer et aller plus loin?? In Kodikologie und Paläographie im digitalen Zeitalter 2 - Codicology and palaeography in the digital age 2, 247–77. Schriften des Instituts für Dokumentologie und Editorik 3. Norderstedt: BoD.
───. 2013. Système graphique et normes sociales?: Pour une analyse électronique des écritures médiévales. In Medieval autograph manuscripts. Proceedings of the XVIIth colloquium of the Comité International de Paléographie Latine, held in Ljubljana, 7-10 September 2010, edited by Nataša Golob, 429–34. Bibliologia 36. Turnhout: Brepols.
───. 2014a. Conjuguer diplomatique, paléographie et édition électronique. Les mutations du XIIe siècle et la datation des écritures par le profil scribal collectif. In Digital diplomatics. The computer as a tool for the diplomatist, edited by Antonella Ambrosio, Sébastien Barret, and Georg Vogeler, 271–90. Archiv für Diplomatik. Beiheft 14. Köln: Böhlau.
───. 2014b. Paléographie numérique: Projets et perspectives. Écriture Médiévale & Numérique. April 23. http://oriflamms.hypotheses.org/1327.
Stutzmann, Dominique, and Ségolène Tarte. 2014. Digital palaeography: New machines and old texts?: Executive summary. Edited by Tal Hassner, Robert Sablatnig, Dominique Stutzmann, and Ségolène Tarte. Dagstuhl Reports 4.7: 112–34 (112–14, 132). doi:10.4230/DagRep.4.7.112.