We propose an automatic method for attributing manuscript pages to scribes. The system uses digital images as published by libraries. The attribution process involves extracting from each query page approximately letter-size components. This is done by means of binarization (ink-background separation), connected component labelling, and further segmentation, guided by the estimated typical stroke width. Components are extracted in the same way from the pages of known scribal origin. This allows us to assign a scribe to each query component by means of nearest-neighbour classification. Distance (dissimilarity) between components is modelled by simple features capturing the distribution of ink in the bounding box defined by the component, together with Euclidean distance. The set of component-level scribe attributions, which typically includes hundreds of components for a page, is then used to predict the page scribe by means of a voting procedure. The scribe who receives the largest number of votes from the 120 strongest component attributions is proposed as its scribe. The scribe attribution process allows the argument behind an attribution to be visualized for a human reader. The writing components of the query page are exhibited along with the matching components of the known pages. This report is thus open to inspection and analysis using the methods and intuitions of traditional palaeography. The present system was evaluated on a data set covering 46 medieval scribes, writing in Carolingian minuscule, Bastarda, and a few other scripts. The system achieved a mean top-1 accuracy of 98.3% as regards the first scribe proposed for each page, when the labelled data comprised one randomly selected page from each scribe and nine unseen pages for each scribe were to be attributed in the validation procedure. The experiment was repeated 50 times to even out random variation effects.
In 1970, workers involved in the restoration of a chapel in Speyer found a reliquary containing a very old manuscript leaf. Experts called in to examine the item were—we can imagine—excited to find writing in gold and silver ink on purple vellum using a somewhat odd alphabet. These details must soon have led their inquiries in the direction of the evangeliary known as Codex Argenteus, which, after dramatic travels, had ended up in Uppsala. Philologists could later definitely verify that the Speyer leaf was one of those missing from the codex. There are a number of circumstances, e.g. the Gothic language and alphabet, the extraordinary design, and the textual content, which strongly speak in favour of the conclusion that the solitary leaf belongs to the evangeliary.
If we are interested in finding the manuscript home of an odd medieval fragment of a more ordinary kind, say a piece of parchment with Latin text written in some cursive script typical of the 15th century, we face a difficult problem. Just to see that two pages are from the same codex, scribe, or cultural context, and to justify such a conclusion, requires the expertise of a palaeographer. Browsing thousands of 15th century manuscripts, one-by-one, library-by-library, to compare them with an enigmatic leaf, would necessitate enormous efforts, even if the most efficient of philologists would contribute their competence to the project.
Today many libraries are in the process of digitizing their historical collections. This gives us new opportunities to compare manuscripts and to find new connections among them. A modern expert trying to place an odd medieval page in its context of production would most likely use a computer, at least for viewing digitized manuscripts. This article will be concerned with using computational resources to compare parts of manuscripts. To be more specific, it will focus on automatic scribe attribution.
The main purpose of the system described here is to predict, by means of automatic analysis of digital images, which scribe has produced the writing on a manuscript sample. This is essentially a classification problem. In this context, each scribe (or we could say class) is identified as the hand behind a set of given manuscript images. The system has access to a set of writing examples which constitutes a database of known scribes.
A secondary purpose of the present system is to produce arguments for scribe attributions which are comprehensible to a traditional palaeographer or even an ordinary human reader. This means that the classification procedure must follow a series of steps from which we can derive a presentation of the evidence which is compatible with this purpose. The central idea is that we can justify scribe attributions by highlighting similarities between the letters of the manuscript under examination and letters produced by scribes from the database. The system is consequently in the vein of “digital palaeography” (
In connection with this study, we have compiled and published open-source a data set comprising 46 medieval scribes writing in book hand scripts (see Appendix for details).
Knowing who has produced a manuscript is of obvious relevance in disciplines like history, literary studies, and philology. In traditional palaeography (as defined in e.g.
The high costs of non-digital approaches to scribe attribution—or more commonly, “writer identification,” in technical contexts—have motivated researchers to study automatic scribe attribution for both historical and modern documents. The challenging nature of the problem from the point of view of image analysis has also stimulated academic attention. Computational research on modern handwriting overlaps with forensic science, whereas work on historical data belongs to the field of “digital palaeography.” Closely related problems which can also be assigned to this area are script classification (
Most scribe attribution systems for historical manuscripts are based on machine learning and make use of features which can be extracted independently of linguistically informed segmentation and labelling of the writing (
Feature models working in the fashions described above capture, on a document sample level, the distribution of image details much smaller than letters. This means that the models are difficult to visualize in terms comprehensible from a traditional palaeographic point of view. By contrast, Ciula (
Comparing the performance of scribe attribution systems is an intricate task, since different systems target different kinds of writing. Furthermore, evaluation scores for different systems are based on data with varying numbers of writers and different amounts of data available for each writer (
An important metric in validation of scribe attribution systems, and classification systems generally, is the top-1 accuracy score, which considers the highest-ranking prediction for each query item: It is the ratio between the number of true predictions and the total number of predictions. State-of-the-art systems for
In the
In their work on medieval handwriting, Brink (
When given a query example, the current system predicts a scribe selected from a set of individuals, each one defined by labelled manuscript data. The scribe attribution procedure relies on a sequence of processing steps involving two fairly simple classification modules. One of the advantages of this is that the process will use evidence in a way that is comprehensible for palaeographer with a traditional understanding of the task. This means that predictions are reached in a way that corresponds to an argument that can be visualized for the user. Another gain is that the system can be applied without a potentially time-consuming training step, as would typically be necessary when models based on machine-learning are used. The system exists and was evaluated in the form of a Java implementation.
The operation of the system is guided by a set of parameters. Experiments made during the development phase suggested that the parameter setting described below leads to a good performance. It is the one which was used in the evaluation reported below. The parameter values can arguably also be explained and justified from the point of view of an a priori understanding of Latin book hand scripts, even if the values, admittedly, to some extent are arbitrary. In work with new data, the system invites retuning of the parameter settings.
In each application of the system, the labelled data are a set of images sampling a certain amount of writing for each scribe. This amount can be just a part of an image, one full image, or several images. Different sizes of the query units (to be attributed to a scribe) are also possible. In the experimental rounds of the evaluation reported below, one manuscript image was in all cases the size both of the labelled samples and of the query units. As will be described below, the labelled data were randomly selected from the manuscript data set, and the remaining (unseen by the system) images were used to generate queries in the evaluation procedure. The images of the data set are in the high-resolution state-of-the-art quality forms provided by the libraries and correspond to one page or one spread. (The data set is published open-source, see below.)
The first processing steps applied to the manuscript files are cropping, which removes the image margins, and scaling. After that, the system will operate on “binarized” versions of the manuscript images. In these, the pixels only carry a binary value indicating writing foreground (ink) versus background (parchment/paper). This is a considerable reduction of the information content of the images, as colour and greyscale information will not be available in the further processing. The binarization is executed by means of a version of the commonly employed Otsu (
The binarized representation allows the system to perform connected component labelling for the purpose of extracting connected regions of ink pixels. These regions, defined as sets of foreground pixels, will typically cover letters and letter sequences. Some of the regions are then further segmented into smaller pieces. The idea behind this is that the segments and a subset of the connected components will correspond to single letters and pairs of connected letters. These image elements will be referred to as “components”, and they form the primary objects of scribe attribution in the current system. The segmentation process is guided by the estimated typical stroke width,
Six parameters expressed as products of a constant and
Extraction of writing components. This example shows a region from page 105 in Cod. Sang. 726 (hand csg0726B, here, from the St. Gallen Stiftsbibliothek). The page has been binarized and rectangles indicate which image components were extracted. Blue rectangles frame components which were produced directly by the connected component labelling, whereas the red ones were the result of further segmentation.
The shape of the image components is represented by a sequence of numeric measurements (features). In other words, they form coordinates in a feature space. This allows similarity between components to be modelled in such a way that distance corresponds to dissimilarity. The features, which are computed with reference to the minimal bounding box enclosing the foreground pixels, characterize the component in terms of the distribution of foreground (ink) pixels as captured by a grid of 8 × 8 equal subrectangles over the bounding box. This gives us 64 features, as illustrated by Figure
The grid corresponding to the features which capture the distribution of foreground (ink). It consists of 8 × 8 equal subrectangles defined in relation to the bounding box enclosing the image component (from Cod. Sang. 983, p. 69). Each value is the ratio of the number of foreground pixels to the subrectangle area. The feature vector would in this case look something like, showing the first eight and last eight values: (0.1, 0.5, 0.7, 0.5, 0.2, 0.1, 0.7, 0.3, …, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6, 0.3, 0.0), when the image is “read” top-down and left-right.
Using the components extracted from the labelled manuscript images, the system predicts a scribe for each component extracted from a query page or spread by means of “nearest neighbour” classification. This means that each component is assumed to have been produced by the scribe behind the most similar (least distant) labelled component. Each prediction has a strength which is inversely related to the distance between the two components, i.e. the shorter the distance between the query component and the closest labelled component the better. So, for each query image a set of component-level scribe attributions is generated, and these attributions are at the same time ranked on a scale of strength.
The second main module of the attribution process assigns a scribe to each query manuscript sample by means of a voting procedure. This is based on the arrangement of the component-level predictions in ascending order by the distance score, as described above. A scribe prediction for the query image is generated by voting in two steps: First, the (at most) five scribes who receive the largest number of votes from the top 120 component predictions (or all of them if their number is smaller than that) is determined. After that, the system repeats both the classification of image components and the voting with only the labelled components from these five scribes available, again with voting by the top 120 (or all) component predictions. Finally, the scribe who has received the largest number of votes is returned as the prediction for the query image.
As the component-level predictions are based on the pairwise similarity of image components, they can be visualized for a human reader in a straightforward way. The example in Figure
Matched image components. Query components appear to the left and the labelled ones to the right in the cells. Page 325 in the csg0990B sequence was the one under scrutiny. Predictions conforming to the most common decision for the query image, which were true here, are placed in yellow cells and other ones appear with blue background. This outcome consequently strongly spoke in favour of the hypothesis that csg0990B is the scribe.
The example in Figure
We evaluated the scribe attribution system proposed here by applying it to a data set comprising 46 scribes. As mentioned above, each prediction was based on one image of labelled data for each scribe and one image to be classified. We report the mean top-1 accuracy score and give an overview of which incorrect predictions were made.
Each of the 46 scribes was represented by 10 manuscript images in the evaluation data set. When it comes to medieval documents, in particular books, scribes can often only be identified through instances of their work. In the present data set, only few of the scribes are known by name. The scribes were selected from digitized manuscripts published by a number of websites:
During the development and tuning phase another, disjoint, set of pages from the 36
We designed an experimental set-up to assess the performance of the system using the data set described above. This set-up corresponds to a scenario where images are compared one by one. In each experimental round, one image for each scribe was randomly selected to provide labelled data, i.e. labelled image components were extracted from these 46 images. The remaining 9 × 46 = 414 images, unseen by the system, were used for evaluation. Each experimental round consequently produced 414 predictions, all based on the same labelled data. The experimental procedure was repeated 50 times in order to even out random variation effects.
The images were cropped in such a way that, if
The 50 iterations of the experimental procedure of the evaluation required roughly 41 hours on a Windows laptop (processor: intel core i7-4600U @ 2.10 ghz, maximum heap for the Java Virtual Machine: 6.1GB). This corresponds to on average seven seconds for each image query. The current implementation of the system is an experimental one, which is far from optimized efficiency-wise.
This exercise produced 20358 true predictions (out of a total of 414 × 50). The system consequently reached a mean top-1 accuracy of 98.3%. We can also look at the scribe attributions for single image components: During the 50 rounds of the experimental procedure, roughly 9.6 million component attributions were made, on average 464 for each page. For the first step classifications (with all labelled component data available), 4.2 million of these predictions were correct. This gives us 44.0% as the top-1 accuracy for the component-level scribe attribution.
The system allows us to retrieve information about which false predictions were made. This makes it possible to see which images and thereby which hands lead the system to make mistakes. The erroneous predictions are shown in Table
The errors produced in the 50 rounds of experimental evaluation. 9 × 46 × 50 predictions were made, 98.3% of them were true. These are the remaining 342 incorrect ones. The total number of errors for each hand is recorded here, as are the number of specific erroneous predictions.
Hand | Errors | Erroneous predictions |
---|---|---|
csg0186B | 90 | csg0926: 30, csg0089: 16, csg0186A: 11, csg0557: 9, csg0078: 8, csg0861: 7, csg0053: 3, csg0569: 3, csg0902: 1, csg0562A: 1, csg0562B: 1 |
csg0586 | 59 | csg0726B: 32, csg0990A: 13, csg0602: 8, uubC528: 4, csg0593: 1, csg0644: 1 |
csg0576 | 51 | csg0077: 30, csg0088: 9, csg0078: 7, csg0053: 4, csg0089: 1 |
csg0562B | 47 | csg0053: 32, csg0557: 8, csg0562A: 5, csg0078: 2 |
csg0990A | 34 | csg0990B: 34 |
csg0112 | 26 | csg0053: 25, csg0562A: 1 |
csg0186A | 18 | csg0569: 17, csg0088: 1 |
uubB68 | 9 | csg0725: 8, csg0990A: 1 |
csg0726A | 3 | csg0726B: 3 |
csg0089 | 2 | csg0562A: 1, csg0569: 1 |
csg0557 | 1 | csg0926: 1 |
csg0562A | 1 | csg0053: 1 |
csg0565A | 1 | csg0077: 1 |
The evaluation of the system showed that the system performed well on a data set containing both completely new manuscripts (the Scandinavian ones) and unseen images from the same codicological units as those consulted during the tuning of the system. As studies in medieval scribe attribution are few, and the data sets used in evaluations have had different properties, it is not possible to make a fully-fledged comparison of the present system with previous ones, as regards their performance as classifiers in scribe attribution. That said, we can however see that it delivered a mean accuracy score which is higher than the numbers which have been reported for previous experiments with medieval data, which covered smaller sets of scribes. We will also argue below that the errors of the system to a high extent are “reasonable”. An innovative component of the present system is the module that presents evidence for attributions in a way that invites qualitative inspection of the kind promoted by traditional palaeography.
Some challenges for the present system should be mentioned: A basic difficulty is that manuscripts on which the binarization module would perform poorly could be difficult to process in the intended way. Defective binarization would interfere with the extraction of writing components. This situation could, for instance, arise for manuscript images with uneven contrast between background and ink, in particular in combination with damages. Low resolution would be a related kind of problem. As these are common and serious troubles for all work with historical manuscripts, they can hardly be seen as indicating specific flaws of the present approach.
Another possible obstacle is that densely connected forms of writing could make it difficult for the component extraction module to find a sufficient number of useful segmentable components. Furthermore, the system is sensitive to rotation of the writing in relation to the digital images. The text lines in the images which have been studied here are roughly parallel with the x-axis. In the evaluation of the system, rotation was consequently not a serious problem. However, some mechanism for correcting image orientation would make the system more robust.
Systems of this kind face many challenges on the path to becoming really useful tools for historians and philologists. One of the most important questions is what happens when the data sets become much larger. The “nearest neighbour” classification is an instance of linear search. The time it takes is proportional to the size of the set of labelled components. This means that some more efficient component classification method will be needed as the data sets grow. Given that the labelled data comprise hundreds of components for each scribe (and each page), it would be possible to estimate which shapes are most strongly distinctive for one or a few scribes, and which ones are more “commonplace”. After that, only the more distinctive shapes would be used as labelled data in the component classification step. This would reduce the time needed for the “nearest neighbour” step and could improve the ability of the system to deal with a larger number of scribes.
The decision to use a size-neutral feature model was guided by a wish to focus on the shape of letters rather than their actual size. (See the discussion of Figure
There is a disputed case among the manuscripts studied here: In the e-codices “Standard description” for Cod. Sang. 603, Von Scarpatetti (
Matched image components for page 468 in the csg0603C hand. The page was attributed correctly (91 component attributions out of 120 support that), but many components were matched with the csg0603B hand. (The table only shows the 56 strongest instances of the 120 component attributions which would decide the verdict on the page.)
As mentioned above, Table
Matched image components for page 148 in the csg0186B hand. The hand csg0186B is the one which was most often misclassified in the evaluation. For the components that we can see here, the associations stay in the Carolingian minuscule script, and in the same grapheme (sequence), but many point in the direction of an erroneous hand. The three top hands for this page as regarding number of votes received are csg0078 (33 votes, in yellow), csg0186B (29 votes), and csg0569 (22 votes), of a total of 120 votes. (The image only shows the 56 strongest ones.)
The Bastarda scribe csg0586, the second most often misclassified one, gave rise to 59 errors. The letters of this scribe are connected by thick lines in a way that seem to cause an unusually small number of components to be extracted. This probably contributed to the difficulties. However, we see again that these attributions are to scribes using the same kind of script, i.e. other varieties of Bastarda, and to uubC528, which, like csg0586, is characterized as a cursive script. This suggests that a method similar to the one proposed here could be used to address the task of script classification.
The most common specific incorrect attribution (34 cases) is pages from csg0990A being classified as csg0990B. As mentioned above, the two hands represent very similar Bastarda scripts, and worked in the same scriptorium at the same time. A similar situation can be seen as regards the hand csg0562B. It is striking that it was often, in 32 cases to be precise, attributed to csg0053. According to Von Euw (
We have outlined and evaluated an automatic system for identifying the most plausible scribe responsible for the writing found in a manuscript image. The set of known scribes was defined by one manuscript image for each hand in the individual experiments we conducted. The central principle of the system is that scribe attribution is performed as a two-step bottom-up classification procedure. First, the system classifies roughly letter-size components by means of “nearest neighbour” classification, based on shape-related similarity. Secondly, the set of component-level attributions, which typically contains hundreds of elements for a page, is used to predict the page scribe by means of a voting procedure. Both the pairings of similar components and the voting procedure are easy to understand for a user without knowledge about the computational details of the system. This makes it possible to instruct the system to generate a visualized presentation of the evidence for a proposed scribe attribution. This forms a kind of argument which highlights the pairwise similarities between the writing components which were taken to decide the issue. This innovative feature allows the system to provide input to qualitative palaeographic analysis.
The binarization step and the extraction of writing components are motivated by a wish to specifically focus on the writing as ink on the writing support. This idea goes hand in hand with the assumption that in general writing is a matter of a bichrome contrast between ink and background. Notwithstanding, the design of many medieval manuscripts, also several of those in the current data set, makes artful use of several colours. Colours and their distribution also have a lot to tell about the composition of the ink and the writing material, as well as about the way a manuscript has been handled during the centuries. It is certainly possible to exploit this information in a classification system associating pages with codicological units, and it would most likely be useful for the current data. This would however be another task, one of performing codicological unit attribution based on the full range of information available in manuscript images. This problem is worthwhile and interesting in its own right, but it is something else than scribe attribution based on the writing itself as the visible trace of the scribe’s performance.
The basic principle of the present system, that of performing scribe attribution bottom-up, classifying details first and derive a verdict on the whole sample from the detail-level attributions, is compatible with further refinement of the modules involved. The binarization module, the component extraction and selection, the feature model, the component classification algorithm, and the voting procedure all invite experimentation with more sophisticated and context-sensitive mechanisms. In particular, we can note that the system treats all writing components in the same way. The examples in Figures
The evaluation data do not present the more challenging task of identifying scribes across different codices, with possibly different scripts, let alone across different languages. Rather, in the data, each scribe is represented by one codex in one language. As illustrated by the examples discussed above, language influences which writing components are likely to be extracted and consequently how they can be matched. To explore the present system for cross-language scribe attribution is an interesting possibility for future research.
Considering that the present system is a simple and straightforward one, it works remarkably well. It attributes scribes to manuscript images with a high degree of correctness, and it has the ability to show us why it counts an image as the work of a known scribe. In order to create a really useful software tool from the ideas that we have exploited here, the system should be equipped with an interface that allows the user to experiment with different modules and parameter settings. Furthermore, as hinted above, systems of this kind should be implemented in a fashion that makes it possible to work with really large collections of manuscripts.
The additional file for this article can be found as follows:
The Medieval book hand data set. DOI:
This work has been carried out in two projects, supported by the Swedish Research Council (Vetenskapsrådet, Dnr 2012-5743) and Riksbankens Jubileumsfond (Dnr NHS14-2068:1), and led by Anders Brun and Lasse Mårtensson, respectively. The current paper is based on a presentation prepared for the conference
The author has no competing interests to declare.