Digital palaeography: using the digital representation of medieval script to support palaeographic analysis


This article shows how the System for Palaeographic Inspections (SPI) software suite developed at the University of Pisa can be used to assist palaeographers in their attempts to classify and identify medieval scripts. Working with a small corpus of Tuscan manuscripts from the tenth- through twelfth-century now owned by the Biblioteca Comunale degli Intronati in Siena, the article shows how the software can be used to characterise the calligraphic ideal for each script in a given manuscript, compare letterforms in different scribes’ work, and define relationships among individual scripts and manuscripts.

The article concludes with a discussion of potential improvements for the SPI system.


palaeography, imaging, System for Palaeographic Inspections (SPI), digital representation

How to Cite

Ciula, A., 2005. Digital palaeography: using the digital representation of medieval script to support palaeographic analysis. Digital Medievalist, 1. DOI:


Download HTML







Palaeography as a historical science

The centrality of script

§ 1 Among the disciplines that study the past through its textual heritage, palaeography focuses on the original script of manuscript books.[1] Although, as Mabillon argued, neither script, nor any other single aspect of a book, is an adequate basis for [final] judgement (quoted in Brown 1993, 19), the goal of the palaeographical method remains the dating and localisation of manuscripts through the analysis of the physical features of their writing.

§ 2 The broader interests of palaeographical research range from the origins and alterations of manuscripts to their owners and uses. The cultural materiality of the manuscript itself (see McGann 2003), however, directs the research. All physical aspects of the book as an individual object with its own history are inseparable from the script and its creation (See Parkes 1991). The techniques used in a manuscript's manufacture, the notes made by its scribes or illuminators, the indications borne by its kalendars and litanies, its provenance and textual tradition, its decoration and illumination—all are valuable guides to palaeographic interpretation.

The palaeographic method

§ 3 Palaeography's most fundamental method is the graphic comparison of one or more manuscripts against a dated (or datable) and localised corpus. This involves a formal and stylistic comparison—as Supino Martini stresses in her statements about the palaeographical method—between what has reached us with a close paternity (or date) and what is supposed to be possibly brought back to the same paternity (or date).[2] It is this comparison (see Unsworth 2000) that enables the palaeographer to discover what is generically similar in disparate samples through the observation of pertinent graphic facts in context. It is this same comparative method that also allows him or her to identify unconsciously idiosyncratic aspects of a scribe's individual style—providing indispensable clues for establishing the identity or non-identity of unknown hands.

§ 4 Thus the discriminating palaeographical eye, trained by experience of observation, the synoptic examination of manuscripts, and the practice of analogy, is able to see order in what might otherwise appear undifferentiated. It is an eye that weaves a logical plot around what seem to be intuitive likenesses or unlikenesses, thanks to what has been called a conjectural paradigm of enquiry.[3] This eye draws clusters a posteriori from the disparate evidence, making available a selected set of observable categories which are in turn useful for future practice (see Gumbert 1976). Its interpretation depends on the features it chooses to highlight: a different selection may alter its understanding of the entire sample to a greater or lesser extent. The cognitive validity of the paradigm depends on the criteria used to establish the pertinent distinctions.

Nomenclature and ontology

§ 5 A palaeographical study begins with the choice of a sample corpus. Although the corpus is chosen to provide an adequate sample of handwriting, the manuscripts in question can be very different in provenance, date, and script. In identifying and classifying graphically significant features, therefore, nomenclature becomes crucial. Indeed, in palaeography, the definition of terminology used to create significant categories is a high priority.[4] The polyphony of the real manuscripts (see Sperberg-McQueen 1991) needs to be abstracted and reduced as consistently and systematically as possible, without constructing arbitrary boxes to squeeze facts into (Gumbert 1976). A consistent system of reference or ontology of information for palaeographic analysis does not exist, however.[5] Its existence would not eliminate doubts and reservations, nor would it accomplish any completeness or exhaustiveness in the representation of such a variety of individual historical realizations. But it would certainly relieve palaeographical studies of some unnecessary confusion and mystery.

The role of facsimiles

§ 6 Publications on palaeographical subjects need to be accompanied by illustrations. Indeed, even the pioneer volumes on palaeography of the seventeenth and eighteenth centuries bore accurate imitative drawings (see Petrucci 1984), while the treatises of the early nineteenth century featured hand-engraved facsimiles. At the end of the nineteenth century, the development of the technique of photography made available full-page reproductions,[6] and it is not by chance that Latin palaeography flourished to new life when photographic facsimiles could make explicit the systematic studies of abbreviations carried out by the philologist Ludwig Traube (see Traube 1907).

§ 7 Generalisation, however, requires abstraction. For the sake of generalisation and dissemination, the original handwriting of the actual manuscripts needs to be reduced to a prototype for classification and study. The leading strategy has been the one of expressing the variations by means of hand-drawn letterforms. The tension between the individual handwritten sample and its abstract model is thus resolved by relying on partial visual representations created ad hoc to match the corpus of analysis. Paradoxically, however, this fundamental process of abstraction causes, once again, a descent back into the subjective representation of what a single palaeographer considers to be the typical alphabet of a given script.

Digital palaeography

Goals and method

§ 8 The undertakings of digitisation, of digital editions and the provision of online facsimiles, are already dealing in part with this fundamental issue of visual representation and with the general problem of access to historical material.[7] The aim of this research is to use the digital representation of book hands as a tool to support palaeographical analysis by human experts. Taking a humanities-computing approach[8] to the traditional study of medieval manuscripts, its purpose is to show how digital representation may help to describe a certain graphic style of handwriting, and how it may help in the comparison of different scripts that are geographically and chronologically related. If the palaeographical comparison between dated and undated codices makes assumptions and hypotheses of correlation based on an individual's expert eye, the possibility to be explored here experimentally is whether the eyesight can be made sharper by the use of a computational instrument. The point is not to replace the inadequacy of graphical comparison by the power of numerical precision or to misrepresent the richness of the assessment of clues by absolute statistical data. Rather it is to explore a different, complementary, methodology.

§ 9 Although the present research is focused on a specific case study, the methodology explored here claims a more general value. What in fact the results have shown is that the digital models produced by the SPI provide a considerable support to the analytical description of letterforms and to the systematic comparison among scripts which differ in small details. As a consequence, the nature of the traditional palaeographical method is enriched rather than diminished or undervalued.

§ 10 The methodology explored in the current work could be applied not only to other case studies and corpora, but may in fact enhance already existing projects on image-based digital editions of medieval manuscripts and documents by offering some effective tools for palaeographic interpretation.

The role of quantitative approaches to the study of script

§ 11 To date, quantitative studies in palaeography have been limited, apart from some pioneering but isolated works (e.g. Gilissen 1973; CNRS 1974), to the production and format of books—the so-called codicological examination.[9] Although the morphology of letters is of recognised importance, technical measurement of letterforms, or, in general, any quantitative approach to the study of script, has been received with skepticism.[10] In part, this is because the study of script has only been conceived as systematic typological categorisation, detached from the materiality of text and its implications (Mastruzzo 1995, 461).

§ 12 Within the current research, the value of the trained palaeographic eye is not in doubt. The present project aims to explore the possibility of supporting the human expert by quantifying the graphical signs and providing tools that, rather than diminish its interpretative insights, support or guide them. The foundational idea is that the examination of specific explicit criteria can help to bring sensible and meaningful order to our understanding of script.[11]

The System for Palaeographic Inspections (SPI)

§ 13 The System for Palaeographic Inspections (SPI) software was developed at the Computer Science department of the University of Pisa by a team of postgraduate students coordinated by Prof. Alessandro Sperduti and Prof. Antonina Starita.[12] It was created for supporting the identification of similarities among letters belonging to different manuscripts. Unfortunately, its implementation was never completed. Moreover, while the software has been tested in feasibility terms by the computer scientists, it has not reached a state to be actually used by the History Department of Pisa University, the humanities partner of the original project. The two academic partners ultimately abandoned the realisation of the original idea and the project came to an end.

§ 14 The present research was started with the aim of experimenting with this application, in order to establish a methodology and to produce results that eventually will lead toward a future project involving the author as humanities computing expert and one of the researchers from the University of Pisa as specialist in machine learning. While the concluding paragraphs of the current paper summarise the strengths of the application and methodology in use, they also point to the need to redesign the entire program to make it possible for other scholars to benefit from its use and to envisage further developments.

§ 15 At the moment, the SPI has two main functions:

  1. to show and quantify graphical relationships among manuscripts bearing different styles of handwriting;
  2. to provide objective measures of similarity between manuscripts of unknown date or provenance and a corpus of stored models of localised and dated manuscripts generated by the system.

§ 16 The system consists of a set of modules communicating through a shared database. It can be conceived of as a set of agents that work on different tasks: an agent that is engaged on the segmentation phase, one that manages the generation of models, and one that processes the classification of styles of handwriting. The system is build around a relational database, called the palaeographical database, that contains all the information the application produces and processes:

Schematic of the SPI system.
Figure 1: Schematic of the SPI system.

§ 17 User interaction is limited to two main tasks: inserting bitmap files into the database following a previous phase of capture of the relevant manuscript leaves, and extracting relevant letters (called characters) for the segmentation phase. Once the relevant letters related to a certain graphic unit have been extracted and entered into the database, they may be used either as samples for the generation of a prototype of the given hand—for instance, for the generation of the model of the letter 〈a〉 in a certain manuscript—or as input for the classification module, as described below.

§ 18 While exploring the database, the palaeographer may browse among various images of the manuscripts that have been inserted on the system and among the corresponding folios; the extracted characters may be visualised, and the settings for the generation of models and classifications can be changed.

Experimenting with SPI

Corpus of manuscripts

§ 19 To achieve the present analysis, a corpus of manuscripts has been chosen and digitised according to specific criteria. Around forty codices —without taking into account the fragments—dating from the 10th to the 12th century, held at the Biblioteca Comunale degli Intronati of Siena[13] constitute the wider corpus from which the sample manuscripts have been chosen. Previous work in art history (Klange Addabbo 1987) and palaeography (in particular Garrison 1984, and Berg 1968) have not been able to identify a single scriptorium as the common source for this varied group of codices. Because of the diversity of the writing style and the general lack of information on copying activity in the convents, several have been classified as generally belonging to the area of central Italy. Most come from the lost libraries of Benedictine convents and monasteries from the territory around the Tuscan city of Siena, which were confiscated between the eighteenth and the nineteenth centuries.[14]

§ 20 Five codices from this larger corpus have been selected for experimentation by the current author for her master's thesis (Ciula 2003, 2004a, 2004b, and 2004c): FIII3, FIII13, FV2, FV8 and FV21. All were held at the monastery of S. Eugenio just outside Siena in the fifteenth century, as the notes of possession in them attest.[15]

§ 21 This sample includes, above all, liturgical texts. The main text is usually written by more than one scribe, in either one or two columns, and it is often framed by contemporary or later marginal and interlinear glosses.

§ 22 The handwriting in this sample can be placed on a continuum, from the not-yet-formalised Caroline script of the ninth century through intermediate trends to the Gothic hands of the late twelfth century.[16]

§ 23 On the basis of earlier studies concerning these codices, it is possible to identify three groups of scripts:

  1. Non-formalised Caroline handwriting (9th-10th centuries)
  2. Intermediate groups
    1. 1st group: 11th century;
    2. 2nd group: first half of the 12th century;
    3. 3rd group: second half of the 12th century;
  3. Pre-Gothic (early or incipient Gothic script; late 12th-early 13th century).

Out of this approximate classification, there is a group of codices that have been generally dated to the twelfth century, again according to earlier studies.

§ 24 The five codices from the monastery of S. Eugenio studied here belong to three of the five groups: FV8 (end of ninth/beginning of tenth century) and the first part of the codex FV2 (tenth century) belong to group 1; FIII3 (anno 1017) and FIII13 and the second part of FV2 (both eleventh century) belong to group 2a; FV21 (end of the twelfth century) belongs to group 3 (all dates as in Avitabile et al. 1970).

Phases of inspection

§ 25 The following paragraphs will show in more detail how SPI may work during a palaeographical enquiry and can support:

  1. the morphological analysis of the model related to a single letter or group of letters;
  2. the comparative analysis of a predefined set of several models;
  3. the classification of new manuscripts by means of a set of classifiers that work on single letters or groups of connected letters.

§ 26 For SPI to work and produce models of letterforms, it is necessary first to separate the letters out, that is to say to segment the continuous script of the manuscript into individual occurrences of letterforms. Some premises need to be stated to justify such an approach, which treats the single letter form as the basis for morphological analysis.

§ 27 In medieval palaeography it is widely recognised that many variations in handwriting are not random, but consciously produced. Indeed, the medieval scribe had in mind a script to pursue, a calligraphic ideal (see Brown and Lovett 1999). He foresaw what the text should look like. He did this by avoiding variants, so as to achieve a coherent, homogeneous version of the handwriting he intended to perform. This is true for the case of book scripts, and especially for Carolingian script, which had to obey to the rules of coherency and formality, legibility and beauty, henceforth privileging constructed letterforms, made in several strokes.

§ 28 The segmentation process allows us then to focus on distinctive letterforms, which will vary minimally for the same style of handwriting. Nevertheless, despite the experienced scribe's taught skills, handwriting, like any human activity, varies and is never the same. No letter can be re-written by hand in exactly the same way. This intrinsic variability complicates all automatic image processing of any human handwriting. It is a concrete example of what is called the problem of inconsistency in automatic pattern recognition (for an introduction to document image analysis see Bunke and Wang 1997). From a palaeographical point of view, the issue is to distinguish between the differences due to natural human inconsistency and those due to changes of style.

§ 29 For this study I have chosen not to segment the manuscripts in their entirety, but only the single letters or groups of connected letters that are relevant. The SPI segmentation module, which works quite successfully with non-cursive medieval hands, goes through these steps:

  1. The palaeographer chooses a letter to segment within the digital page from the alphabetic list on the toolbar, and this starts the training of the system.
  2. The area of the image that contains the letter, that is to say the frame that delimits the effective extension of the character, its blob of ink, is then selected, either manually or automatically.[17]
  3. The segmentation algorithm suggests a segmentation for the chosen letter or groups of letters.[18]

Segmentation window of the SPI.
Figure 2: Segmentation window of the SPI.

§ 30 The main parameters for the segmentation relate to the typology of pattern to segment, that is to say on what is called projection x of a certain character.[19] Letters are conceived as divided into three groups, depending on the expected distribution of the blobs of ink; in particular, the vertical histogram is used to measure the gradient of luminosity of pixels on the palaeographical image[20]:

  1. one-modal characters, such as 〈i〉 and 〈l〉 (or 〈f〉 in the figure below)
  2. two-modal characters, such as 〈d〉 and the ligature 〈st〉
  3. three-modal characters, such as 〈m〉 and the ligature 〈sti〉.

Examples of one, two, and three modal characters with corresponding histograms.
Figure 3: Examples of one, two, and three modal characters with corresponding histograms.


§ 31 Before the second phase of model generation, it is necessary to repeat the segmentation process by selecting a certain number of characters representing the same letter.

§ 32 In the training phase, a centroyd is obtained from the set of characters that have been extracted during the segmentation phase. The centroyd is an average letter, a sort of pure character—in brief, a model. In mathematical terms, a centroyd plus a set of tangents of certain cardinality compose the prototype. Through its tangents, the model represents an average of the characters belonging to the training set.

Centroyd (left) and tangents (right) for an example letter a.
Figure 4: Centroyd (left) and tangents (right) for an example letter a.

§ 33 Hence, on the basis of digital images of medieval book hands, the system generates some graphical models related to single letters or to groups of connected letters.


§ 34 Once the models are created, they may be used for either interpreting models that already have been generated and saved so far (that is to say as objects for the palaeographer to observe) in order to analyse their morphological characteristics; or for creating clusters of reference for the system itself to refer to when the classification of manuscripts takes place.

§ 35 The models and, in particular, the figures by which they are represented—the centroyds—may be subjected to a series of transformations. This procedure of graphical transformation is called tangents analysis.

§ 36 More specifically, what the system is able to visualise are projections on the subspace of the model. The common characteristics, that is to say the tangents of the class, remain invariant, while the model is allowed to show its possible changes within its tangents' subspace.[21] In this way, as the deformations of the centroyd along the relative tangents are made visible, it is possible for palaeographers to observe the directions of invariance of the model itself. To sum up: it is possible to stretch the centroyd towards one or the other direction —the directions are defined by the specific tangents—so as to highlight the morphological possibilities that the model encompasses. The resulting visual effect is a morphing of the centroyd.

§ 37 A window of the application allows this graphical visualisation of the tangents and shows how the character's prototype changes while its position varies within a determined subset of tangents.

Tangents analysis window of SPI.
Figure 5: Tangents analysis window of SPI.


§ 38 It is always possible to display and to browse the models that have been generated and stored. However, the complete diagram and dendogram tools facilitate some interesting inspections based on comparison of the stored models.

Complete diagram

§ 39 The complete diagram is a graph that defines the relationships among different hands by calculating the distances among the stored models. It is possible by this to visualise the models according to the their degree of similarity to a selected model.

Diagram of relationships among samples of an example letter b.
Figure 6: Diagram of relationships among samples of an example letter b.

§ 40 A preliminary phase is necessary during which the software computes the matrix of distances among the models taken in pairs. A measure of distance, which represents the degree of similarity, is then associated with every model compared to the other ones and it appears at the top of each model frame. The same calculation of distance is used by the next tool, the dendogram.


§ 41 The dendogram is based on a clustering algorithm that splits the set of models in subsets, and orders them in a hierarchical structure of clusters. Every cluster represents a subset of models sharing a certain morphological similarity. The result of such a computation is visualised as a binary tree that is actually the dendogram itself.

Dendogram of an example letter f.
Figure 7: Dendogram of an example letter f.


§ 42 Besides being useful for palaeographical analysis and for graphical comparison, the models are the references for the last phase, that is the effective classification phase, as briefly anticipated above. (Classification of manuscripts here means the retrieval of models from the database that are more similar to the hand that needs to be classified or defined from a morphological point of view.)

§ 43 During the operative phase of classification, the palaeographer extracts some samples of letters from the subject manuscript, this time not for training the system but for testing its classificatory skills and interpreting the results. At this point, the system

  1. applies transformations to the samples, establishing the possible variants of the pure character or centroyd;
  2. compares characters extracted from the subject sample with stored prototypes from the comparison sample on which the training phase had been performed[22];
  3. retrieves manuscript models which are morphologically similar to the subject characters.

§ 44 This kind of classification is known as committee classification. At the end of the calculation, that is after the set of characters chosen as samples have been processed, the classification selects the graphical unit which has the most winning comparisons:

Committee classification of an example letter 〈a〉.
Figure 8: Committee classification of an example letter 〈a〉.

Digitisation method

§ 45 In applying SPI to the corpus of manuscripts under study, the first stage to define and perform is obviously the digitisation process: for every codex belonging to the subject corpus, four pages were scanned for each style of script. For manuscripts showing a homogeneous style, this meant four pages per manuscript; in more heterogeneous manuscripts, four pages were scanned per style. Pages were scanned at 300 dpi and an archival collection was built on CD-ROM in TIFF format for preservation purposes. Image files used in the SPI were converted into bitmaps with two levels of bit depth, as required by the program. The digital leaves were cropped and introduced into the database in sections corresponding wherever possible to manuscript columns so as to facilitate image management and further analysis.


§ 46 After the setting up of the digitisation process, it has been possible to perform the segmentation for every page, starting with the letter 〈a〉 (the evolution of which is rather emblematic in the development of the Caroline handwriting) and continuing with 〈d〉, 〈f〉, 〈g〉, 〈m〉, 〈n〉, 〈p〉 and 〈u〉. It would have been ideal to perform the segmentation and subsequent phases for all characters in the alphabet, ligatures, and, eventually, punctuation. The available version of the software does not allow this at the moment, however.

Analysis of letterforms

§ 47 Once the relevant letters had been segmented and automatic models generated, qualitative interpretations of the quantitative representations were elaborated, borrowing terms and methods from the palaeographical literature and intertwining them with the new methodology based on the digital models generated by SPI. The following sections shows these results for the letter 〈a〉 in FIII3.

〈a〉 (FIII3)

Examples of the letterform 〈a〉 (as extracted characters and as centroyd derived from them in the top left corner).
Figure 9: Examples of the letterform 〈a〉 (as extracted characters and as centroyd derived from them in the top left corner).

§ 48 The first basic stroke of the letter 〈a〉 is the shaft or back. In this hand, it slopes to the left. It may have also an upcurving endstroke at the bottom that often links with the following letter. The top, when not linking with the previous letter —e.g. in 〈ea〉 or 〈ta〉 —is shortened, making what is nearly a single-compartment form. The bow is quite triangular and flattened. Its shape is thin at the top and bottom and, by light contrast, thick in the middle of the stroke.

§ 49 An analysis of the first tangent shows that the endstroke of the shaft may be either reduced or extended upwards, so as to link with the following letter on the left side. This analysis also demonstrates the existence of an inconsistency either in the slant of the back (which may be more or less straight), or in the module related to the height of the 〈a〉 (which may be taller or shorter):

Analysis of the first tangent for the letterform 〈a〉.
Figure 10: Analysis of the first tangent for the letterform 〈a〉.

§ 50 On the other hand, the second tangent, which features a fixed slight leftwards-slant, shows a variation related to the width of the letter, either more or less flattened and, consequently, more or less tending towards a single-compartment shape:

Analysis of the second tangent for the letterform 〈a〉.
Figure 11: Analysis of the second tangent for the letterform 〈a〉.

§ 51 Both tangent variations highlight the corresponding differences in the shape of the counter bow, narrowed or wider depending on the changes in the whole letterform.

Comparison of letterforms

§ 52 After the analysis of letterforms, followed by the generation of the corresponding models and the observation of their variants, a comparison among the various manuscripts was carried out with the support of the diagram and dendogram tools. For interpreting the measures, the comparison of different structures of the diagram—structures which can be obtained by changing the model at the start of the linear measurement—has been useful.[23] Whenever possible, connections have been stressed among models of different letters, as shown in the following example for 〈d〉 (compared to 〈b〉). However, some apparent inconsistency of the disposition of the models within the graphs is caused by hands that are not quite formalised, where the broadness of the quill itself may vary, as it is especially the case for FIII3 and FV8.

〈d〉 (vs. 〈b〉)

§ 53 Variations in the morphology of the letter 〈d〉—limited to the straight 〈d〉 and excluding the features of the Uncial 〈d〉, which is present in all the graphic units except for FV1 (1st part)—are affected especially by the shape of the lobe, which may be rather small and flat, or large and round. In addition, the letterform can be accentuated by a contrastive execution as happens for the letterform 〈b〉. The serif of the ascender may feature a club rather than a wedge shape, or a line serif rather than a beak. These variations often occur within the same manuscript and may confirm that the script was not yet canonised.

Diagram of relationships among samples of an example letter 〈d〉.
Figure 12: Diagram of relationships among samples of an example letter 〈d〉.

§ 54 This diagram is quite exemplary, since the first order shown seems to offer a gradual variation in the slant of the shaft, from the rightwards-trend in the first model of FV2 (1st part), to the slight leftward-position in FV21 (88.05). Following the same direction, the lobe seems to grow in module, at the expense of the shorter shaft, and to acquire contrast. It is interesting to note that the same phenomenon—with the slight variation of the swamped position of FV2 (2nd part) and FIII3—related to the slope, to the module of the lobe and to the shading, has been noted also for the analysis of the letterform 〈b〉 (see figure 6 above).

§ 55 As is again true of the letter 〈b〉, FV2 (1st part and 2nd part), FV8 and FIII3 are clustered together on the left of the dendogram, while FIII13 and FV21 are confined to the right side of the graph. According to the measures shown by the diagram, however, FIII3 is this time closer to the second group containing FIII13 and FV21, due to the thickness of its execution and to the wideness of the lobe; while FV2 (1st part) is in its own branch on the left, because of the accentuated inclination of its shaft:

Dendogram of an example letter 〈d〉.
Figure 13: Dendogram of an example letter 〈d〉.

§ 56 The following figure shows a comparable analysis for 〈b〉:

Dendogram of an example letter 〈b〉.
Figure 14: Dendogram of an example letter 〈b〉.


§ 57 The classic subjects of palaeographical analysis include the graphic elements which might contribute to the definition of a script's calligraphic ideal: shape, module, ductus, writing angle, hatching, ligatures, etc. The SPI is able to compute features expressed directly by the letterforms themselves. These morphological parameters can be computed and recorded on the digital representation. In doing so, they widen the classical system of palaeographical characteristics and facilitate a terminological categorisation that is graphically based. The result is a quantitative formal-analytical approach with a considerable representational and descriptive power. At this point, besides the future development of the current experimentation, it is useful to highlight the methodological and technical limitations of such approach.


§ 58 While concentration on the minute features of letterforms and their models is enhanced by the use of this computational tool, the overall appearance of the script and its immediate context is not taken into account by the SPI. From the segmentation process, where the cropping of characters is still accomplished within the digital image-page or within a section of it, all processing by the SPI focuses on the individual letters, ignoring the fundamental setting of the script, from the alignment of letters on up. This new methodology is best used therefore, when it can be supported by the traditional discipline of integral palaeography (see Boyle 1984, xv) and an analysis of the material context, from the appearance and layout of the page to the overall archaeology of the codex.

§ 59 Besides those methodological considerations, the SPI also is affected by several limitations. In order to assure a greater longevity for the application (see O'Donnell 2004), these improvements could be carried out:

  1. the software runs only on Windows 98 and it should be possible to make it work on other platforms;
  2. the graphical interface has been conceived by computer scientists and needs to be redesigned according to the main principles of accessibility and usability[24];
  3. the segmentation works well, but would be improved by the provision of additional filters for the processing of damaged or otherwise difficult-to-process manuscript images:

    Damaged manuscript pages and distortions occurred during capture.
    Figure 15: Damaged manuscript pages and distortions occurred during capture.

    Difficulties in the script (biting, ruling, interlinear glosses, conflicting lines).
    Figure 16: Difficulties in the script (biting, ruling, interlinear glosses, conflicting lines).

  4. the alphabetical grill should be widened and, eventually, made flexible so as to accept the addition of specific letters and ligatures;
  5. the fields that can be filled by descriptive notes should be marked up, so as to make possible better terminological accuracy and a structured search function;
  6. it should be possible to automatically compare diagrams and dendograms and to display more than one at the same time.

Future research

§ 60 This experiment with the SPI has been successful in that it has shown how the traditional qualitative palaeographic paradigm can be strengthened and assisted by the creation of graphic models that are quantitative in nature but able to be arranged visibly on the screen. This allows the program to show the infinite variations of handwriting and interpretations connected to them, opening the analysis to criticism and reasoning. It has demonstrated, in short, how quantitative models can be created that call for qualitative exegesis.

§ 61 The palaeographical comparative method is enriched and changed. Yet, from the beginning, terminological coherence is needed to carefully describe the models and to compare them. Moreover, the comparison of many models allows us to see the data from new perspectives, by structuring the corpus using multiple criteria. Finally, the interpretations are made concrete and precise by both digital representations and quantitative measures.

§ 62 The testing of SPI on the specific corpus of manuscripts held in Siena has produced detailed descriptions of the letterforms and of the handwritings under examination, descriptions that may lead to further interpretation of the unknown origin of the codices and of the development of Caroline minuscule. However, the methodology, its advantages and limits, can be generalised and the combination of classical palaeographical analysis and digital models applied to new case studies for the analysis and identification of other book hands or non-cursive documentary hands. An ideal database could actually include samples and models of various Western European handwritings, with punctuation and other signs, so as to facilitate the work of scholars looking for similar origins and provenances, for clues to copying activity and to the circulation of culture.


[1]. For a thorough introduction on palaeography see Bischoff 1990.

[2]. confronto formale e stilistico fra quanto ci è giunto con paternità (o data) vicina e quanto si presume possa essere ricondotto alla stessa paternità (o data) (Supino Martini 1995, 18).

[4]. See the landmark first Colloque International de Paliographie (Bischoff et al. 1954). Actually, the matter of nomenclature has not always been considered a priority and it has been approached from different and often irreconcilable perspectives and attitudes. See, for instance, the debate mentioned in Gumbert 1976 between a historical and a Cartesian—meaning abstract—approach.

[5]. For the ontology of information designed for the analysis of old Roman cursive see Terras and Robertson 2004.

[6]. For a bibliography on the limits of reproduction as secondary evidence compared to verbal description see Tanselle 1989 and the brief comments in Petrucci 1991, 14.

[7]. For a survey and discussion of the future developments of electronic scholarly editions towards the use of enhanced image-pages see Robinson 2004.

Examples of projects which have envisaged comprehensive online access to palaeographic materials from different perspectives are CDFP (Wooley et al. 2002, and Cuneiform Digital Forensic Project 2004), CEEC (Codices Electronici Ecclesiae Coloniensis), and MANCASS C11 2004.

Unfortunately, the granularity of digital page-images does not always satisfy the palaeographer's need for close examination, nor for comparison among related manuscripts.

[8]. For the definition of the discipline see two issues of Computers and the Humanities (36.1and 36.3 2002) and McCarty 2002.

[9]. In fact, as is known, a discipline named codicology claims its independence as the study of the material aspects of books other than handwriting.

[11]. Explicit criteria are intended here not as rigid conceptual containers, but generative types as opposed to l'impression global. Several palaeographical statements have called for transparency over the years; see for instance D'Haenens 1975, and Rushforth 2004.

[12]. The development of the application has been described in some dissertations and in a scientific paper by Aiolli et al. 1999.

[13]. The only existing catalogue, not exhaustive and rather concise, but quite useful for a first study, is Avitabile et al. 1970. Some codices are listed also by Cao et al. 1996.

[14]. The suppression of religious houses in Tuscany in the Napoleonic period is discussed in Biagianti 1985.

[15]. A historical introduction to the monastery can be found in Kurze 2002.

[16]. The palaeographical literature confines Caroline script to the late eight century—when experimental books were written in several scriptoria influenced by Charlemagne's cultural renovation—through the twelfth. Yet, with the exception of Insular, Visigothic, and, in Southern Italy, Beneventan scripts, Caroline minuscule became the universal type for books and documents all over Western and Central Europe by the middle of the ninth century.

According to the classic nomenclature (see Bischoff et al. 1954, Bischoff 1990, Cencetti 1997 and Derolez 2003), Gothic script (late eleventh to the early sixteenth centuries) overlaps in part with the period of Caroline minuscule, while the Protogothic phase ranges from the late eleventh to the late twelfth centuries. In actual fact, however, different names have been given to the transformation from Carolingian towards the new Gothic script: Late Caroline, Pregothic, Protogothic, Primitive Gothic, etc. But even if Pregothic is nothing but a Carolingian script with some new features (Battelli 1949, Bischoff et al. 1954, Cencetti 1997, Derolez 2003, Petrucci 1968, and Zamponi 1988), Italy represents a singular case. The Gothic book hand of the eleventh and twelfth centuries there features a particular round character that seems to have represented a smooth passage directly from Caroline script to the fully-developed Italian Gothic book hand of later centuries, without any evident Pregothic stage. Therefore, in Italian manuscripts, the distance between Caroline and primitive Gothic script may be rather imperceptible.

[17]. The identification of the box may be made by automatic selection, in which the system searches for the optimal local frame after the mouse is positioned in the approximate middle of the letter, or by manually drawing a box around the letter of interest and adjusting the size of the frame with the mouse. The automatic selection is achieved by a technique independent of the morphology of the character, based exclusively on the local approximation of the letter's dimensions using criteria related to the type of handwriting, such as the thickness of the stroke, the module, and the modular rapport between the height and the width of the letter. Even if the cropped frame contains part of other letters, the following phases are not compromised. It is instead of crucial importance that the frame encloses the whole character in question, including linkages with any connected letters.

[18]. In the last version of SPI the palaeographer can access some variations of the minimal segmentation process. These variations have resulted by adding vertical segments to the main automatic segmentation. Therefore, at the beginning, just a minimal segmentation is shown. Starting from there, there is the possibility of deriving alternative segmentations that will be proposed to the palaeographer together with the minimal solution, and the solution suggested by the system. Finally, it is up to the expert to opt for the best segmentation.

[19]. The projection of a character on a straight line of gradient q, is defined as the sum of the image elements along a family of straight lines perpendicular to q. The projections provide a proper indication of the presence of an object on an image. They indicate where it is localised and what its extension is.

[20]. For those letters or groups of connected letters whose morphology is highly dependent on the singular hand, and which are expected to have a consequent variable histogram—as it is often the case of the letter 〈g〉, or as it occurs when there is an alternation of vertical 〈d〉 and 〈d〉 with sloping ascender—it is necessary to change modality. However the actual interface does not provide an option for this change.

[21]. In mathematical terms, the variety obtained as the transformations are applied to the pattern is approximated within a linear space or tangent space, which has the same number of dimensions as the number of defined transformations.

[22]. The stored models have to be activated and they have to be compatible with the examined characters from the subject sample: same type of letter, same image format, and same number of tangents.

[23]. In the application, diagrams and dendograms feature tag titles, which show the association between the model and the corresponding manuscript or graphic unit.

[24]. This is one of the inconveniences of testing an application after its implementation rather than during it. Indeed, the accessibility and usability design process ought to be intimately interconnected with the building of the application itself.

Works cited

Aiolli, F., M. Simi, D. Sona, A. Sperduti, A. Starita, and G. Zaccagnini. 1999. SPI: a System for Palaeographic Inspections. AIIA Notizie vol. 4: 34-38.

Avitabile, L., M.C. Di Franco, and V. Jemolo. 1970. Censimento dei codici dei secoli X-XII. Studi medievali 11.2: 1075-1101.

Brown, Julian. 1993. A palaeographer's view. The selected writings of Julian Brown. Janet Bately, Michelle P. Brown, and Jane Roberts, eds. London: Harvey Miller.

Battelli, Giulio. 1949. Lezioni di paleografia. Città del Vaticano: Pontificia scuola vaticana di paleografia e diplomatica.

Berg, Knut. 1968. Studies in Tuscan twelfth-century illumination. Oslo: Universitetforlaget.

Biagianti, Ivo. 1985. La soppressione dei conventi in età napoleonica. In La Toscana nell'età rivoluzionaria e napoleonica. Ivan Tognarini, comp. 443-470. Napoli: Edizioni Scientifiche Italiane.

Bischoff, B., G. I. Lieftinck, and G. Battelli, 1954. Nomenclatures des écritures livresques du IXe au XVIe siècles. Paris.

Bischoff, Bernard. 1990. Latin palaeography. Antiquity and the Middle Ages. Trans. Dáibhí Ó Cróinín and David Ganz. Cambridge: Cambridge University Press [Paläeographie des römischen Altertums und des abendländischen Mittelalters. Berlin 1979].

Boyle, Leonard. E. 1984. Medieval Latin palaeography: a bibliographical introduction. Toronto: University of Toronto Press.

Brown, Michelle P., and Patricia Lovett. 1999. The historical book for scribes. London: The British Library.

Bunke, H., and M.S.P. Wang, eds. 1997. Handbook of character recognition and document image analysis. Singapore: World Scientific Publishing Company.

Cao, G. M., T. Catallo, M. Curandai, E. Di Mattia, P. E. Fornaciari, E. Peruzzi, and F. Santi, comp. 1996. Catalogo dei manoscritti filosofici nelle biblioteche italiane. VIII: 101- 134. Firenze: Olschki.

CDFP (Cuneiform Digital Forensic Project), University of Birmingham, February 2004.

CEEC (Codices Electronici Ecclesiae Coloniensis), University of Köln.

Cencetti, Giorgio. 1997. Lineamenti di storia della scrittura latina. 2nd ed. Bologna: Patron.

Ciula, Arianna. 2003. Computational suggestions to palaeographical analysis. Lamusa 3.

───. 2004a. A research project. The application of SPI Software to the Corpus of Manuscripts held in Siena. King's College London, London, UK.

───. 2004b. Digital palaeography. In Proceedings of Digital Resources for the Humanities (Newcastle - UK, September).

───. 2004c. Modelli digitali di scrittura carolina. Gazette du livre médiéval 45 (autumn): 27-38.

CNRS. 1974. Les techniques de laboratoire dans l'étude des manuscrits. Colloques Internationaux du CNRS 548 (Paris, September 1972).

Computers and the Humanities 36.1 and 36.3 (2002).

Costamagna, Giorgio, Leon Gilisse, Françoise Gasparri, and Alessandro Pratesi. 1995. Commentare Bischoff. Scrittura e Civiltà 19: 321-352.

D'Haenens, Albert. 1975. Pour une sémiologie paléographique et un histoire de l'écriture. Scriptorium 29: 175-198.

Derolez, Albert. 2003. The palaeography of Gothic manuscript books from the twelfth to the early sixteenth century. Cambridge: Cambridge University Press.

Garrison, Edward B. 1984. Studies in the history of mediaeval Italian painting. Vols. I-IV. Firenze: L' impronta.

Gilissen, Leon. 1973. L'expertise des écritures médiévales. Recherche d'une méthode avec application à un manuscrit du XIe siècle: le Lectionnaire de Lobbes, codex Bruxelliensis 18018. Ghent: E. Story-Scientia.

Gilissen, Leon. 1975. Ductus et rapport modulaire. Scriptorium 29: 235-244.

Ginzburg, Carlo. 1979. Spie. Radici di un paradigma indiziario. In Crisi della ragione, comp. A. Gargani, 59-106. Torino: Einaudi.

Gumbert, J. P. 1976. A proposal for a Cartesian nomenclature. In Essays presented to G.I. Lieftinck, IV: miniatures, scripts, collections (Litterae Textuales), ed. J.P. Gumbert and M.J.M. De Haan, 45-52. Amsterdam: A.L. Van Gendt.

Klange Addabbo, Bente. 1987. Codici miniati della Biblioteca comunale degli Intronati di Siena. Siena: Edisiena.

Kurze,Wilhelm. 2002. I monasteri nella diocesi di Siena fino al XII secolo. In Atti del Convegno di studi Chiesa e vita religiosa a Siena. Dalle origini al grande giubileo (Siena - Italy, October 2000), eds. Achille Mirizio, and Paolo Nardi, 49-64. Siena: Cantagalli.

MANCASS C11 Database, The Manchester Centre for Anglo-Saxon Studies. April 2004.

Mastruzzo, Antonino. 1995. Ductus, Corsività, Storia della Scrittura. Scrittura e Civiltà 19: 403-464.

McCarty, Willard. 2002. Humanities computing, The Encyclopedia of Library and Information Science. New York: Marcel Dekker.

McGann, Jerome. 2003. Textonics: literary and cultural studies in a quantum world. In The culture of collected editions, ed. Andrew Nash, 245-260. Basingstoke: Palgrave Macmillan.

O'Donnell, Daniel Paul. 2004. The Doomsday Machine, or, "If you build it, will they still come ten years from now?": what medievalists working in digital media can do to ensure the longevity of their research. The Heroic Age 7.

Ornato, Ezio. 1975. Statistique et Paléographie: peut-on utiliser le rapport modulaire dans l'expertise des écritures médiévales? Scriptorium 29: 198-234.

Parkes, M. B. 1991. Scribes, scripts and readers: studies in the communication, presentation and dissemination of medieval texts. London: Hambledon Press.

Petrucci, Armando. 1968. Istruzioni per la datazione. In the introduction to Censimento dei codici dei secoli X-XII. Studi medievali 9.2: 1115-1126.

───. 1984. La scrittura riprodotta. Scrittura e Civiltà 8: 263-267.

───. 1991. La scrittura descritta. Scrittura e Civiltà 15: 5-20.

Pratesi, Alessandro. 1977. A proposito di tecniche di laboratorio e storia della scrittura. Scrittura e civiltà 1: 199-209.

Robinson, Peter. 2004. Where we are with electronic scholarly editions, and where we want to be, Forum Computerphilologie, 24 March.

Rushforth, Rebecca. 2004. Review of The palaeography of Gothic manuscript books from the twelfth to the early sixteenth century, by Albert Derolez. The Library 5.2: 204-206.

Sperberg-McQueen, C. M. 1991. Text in the electronic age: textual study and text encoding with examples from medieval texts. Literary and Linguistic Computing 6.1: 32-46.

Supino Martini, Paola. 1995. Sul metodo paleografico: formulazione di problemi per una discussione. Scrittura e Civiltà 19: 5-29.

Tanselle, G. Thomas. 1989. Reproductions and scholarship. Studies in Bibliography 42: 26-55.

Terras, Melissa, and Paul Robertson. 2004. Downs and acrosses: textual markup on a stroke level. Literary and Linguistic Computing 19.3: 397-414.

Traube, Ludwig. 1907. Nomina sacra : Versuch einer Geschichte der christlichen Kürzung. München: C. H. Beck.

Unsworth, John. 2000. Scholarly primitives: what methods do humanities researchers have in common, and how might our tools reflect this? Symposium on humanities computing: formal methods, experimental practice, King's College London, 13 May 2000.

Wooley, S.I., T.R. Davis, N.J. Flowers, J. Pinilla-Dutoit, A. Livingstone, and T.N. Arvanitis. 2002. Communicating cuneiform: the evolution of a multimedia cuneiform database. Visible Language 36.3: 308-324.

Zamponi, Stefano. 1988. Elisione e sovrapposizione nella littera textualis. Scrittura e Civiltà 12: 135-176.



Arianna Ciula (King’s College London/Universita' degli Studi di Siena)





Creative Commons Attribution 4.0


Peer Review

This article has been peer reviewed.

File Checksums (MD5)

  • HTML: 266d7cfc8507c3764862c9d8b343c3f5