1. Objectives and background

§ 1 The present paper has two objectives. The first objective is to develop a digital diplomatic approach for managing intra-formula variation in substantial formulaic documentary datasets in order to enable complex historical-philological corpus studies. This involves a supervised quantification of formula elements into variables that can be submitted to unsupervised statistical classification analysis (here cluster analysis) and subsequent visualization as text reuse templates and graphs that illustrate diachronic and diatopic change. The workflow to be developed is expected to be scalable to other documentary corpora originating from different historical contexts.

§ 2 The second objective is to evaluate the usability of the approach developed by testing it on a real use case. This use case consists of two early medieval Latin formulae, the so-called constat and manifestus clauses, which can both open the dispositive part of a charter by conveying a declaration “it is certain/manifest.” The variation of the constat and manifestus clauses was taken into examination because these formulae appear in a wide variety of document types, were partly mutually competing, and are known to display chronological and geographical variation within Tuscian charters (Ghignoli and Bougard 2011, 283–284; Ghignoli 2007, 44–46). The two formulae will be examined within a corpus of 1,283 Latin (mostly) private charters written in Tuscia in the eighth to tenth centuries (see section 2). Sentences in (1) and (2) present a constat clause and a manifestus clause, respectively. The formulaic parts are underlined.

(1) consta me Aufrid vir devotus hanc die vendedisset et vendedi, tradedisset et tradedi vobis Aunuald, Teutpald … ortu meum quem avire videor ante sancto Selvestre (ChLA 1.896c, AD 720, Lucca)

“it is certain for me, Aufrid, vir devotus, to have sold – and I did sell – and to have handed over – and I did hand over – to you, Aunuald, Teutpald … my orchard that I appear to have in front of [the church of] St. Sylvester”

(2) manifestum est mihi Racchulo clerico, filio quondam Baruccioli, abitatori ad ecclesiam sancti Elari, ubi dicitur ad Crucem, quia per hanc cartulam offero me ipsum Deo et tibi ecclesiae beatae sanctae Mariae sitae in Sexto (ChLA 1.1027, AD 772, Lucca)

“it is manifest to me, Racchulo, clerk, son of the late Barucciolo, resident at the church of St. Hilary in the place called Cruce, that by this charter I offer my person to God and to you, the church of blessed St. Mary, located in Sexto”

§ 3 Documentary formulae are a means of organizing and structuring written communication, and they aim at the formal validation of the transaction recorded in the document according to accepted legal and other socio-cultural conventions. While charters of the same place and period normally use the same or similar formulae to express similar acts and intentions, formulae do change in time and place in response to changing social circumstances and legal policies (e.g., Gervers 1997, 456); this change has been fruitfully investigated focusing on specific diplomatic parts of documents, such as the arenga (Fichtenau 1957, ch. 3) and the invocation as well as the so-called devotion formulae (Fichtenau 1977, 38–61). However, even the “same” formulae are seldom identical in two charters of the same place and period. Early medieval Italian scribes seemingly sought to utilize current mainstream formulae that they obviously considered to be authoritative and legally binding in terms of their sense, although a verbatim adherence to them was not regarded as obligatory. Scribes did not copy charters from formulary books but reproduced the wordings from memory or checked unusual phrases from extant charters at hand, hence the huge linguistic variation not only in the formulae, but also in the spelling and morphology (Amelotti and Costamagna 1975, 215–216; Sabatini 1965, 975–976; Korkiakangas 2022 and the references therein). On the other hand, variance was a significant part of medieval writing in general, as medieval writing largely consisted in reworking previously existing textual matter (Cerquiglini 1989, 57–59).

§ 4 Charters are the only substantial originally preserved written evidence available as a long and coherent time series from the European Middle Ages, and they are extensively used for the historical study of medieval textual culture (on the early Middle Ages, e.g., McKitterick 1989, Bougard 2013, Schwarzmaier 1972). In-depth quantitative research on the variation of documentary formulae in time and place is bound to reveal thus far unnoticed mechanisms and trends not only in the (re)organization of documentary practices, but also in the transmission of cultural influences and over-regional intellectual and professional exchange, as will be suggested in section 6.3. In the present paper, such phenomena are approached within Italian, and more precisely Tuscan, charters because they provide rich data beginning from the first part of the 8th century, a century essential for the history of the constat clause (see section 6.1.). Italian archives host thousands of early medieval Latin charters, the major early concentration being the archives of the Luccan archiepiscopal see (nowadays the Archivio storico diocesano di Lucca) with their almost 1,800 charters prior to AD 1000 and 156 prior to AD 774, the end of the Lombard era (Todros 2010, xi). Although abundant, the other Italian archival fonds cannot be aligned with Tuscan data because they largely date from after the eighth century; this would invalidate the chronological and geographical comparisons proposed in sections 5 and 6.

§ 5 Analyses, such as Tjäder 1955 and Tjäder 1982 on the mainly sixth-century Ravenna papyri, and Zielinski 1972 on the charters of the abbey of Monte Amiata, or Ghignoli’s extensive work on the early medieval charters of Pisa (e.g., Ghignoli 2007; Ghignoli and Bougard 2011), prove that it is possible to deal manually with tens of occurrences of given formulae at a time. Instead, it is practically impossible to analyze and relate several hundreds or thousands of occurrences with contextual metadata without computer-assisted quantitative approach: the human eye easily fails to notice regular patterns of co-variation in large datasets with several varying elements, let alone their relation to chronological and geographical context.

§ 6 In recent years, various initiatives have emerged that apply text mining to medieval charter formulae. The pioneering “Documents of Early England Data Set” (DEEDS) project at the University of Toronto, already launched in the 1970s, has systematically exploited the variation of medieval documentary formulae to predict charters’ other textual-historical properties in a large dataset (DEEDS 2023). DEEDS does not aim at a reconstruction or in-depth study of the formula repertoire as such, but at using repetitive word patterns as fingerprints to date undated charters and to identify unknown scribes (Gervers et al. 2018). Of notable methodological importance are also Nicolas Perreaux’s studies of the diachronic change of lexical concepts within the hundreds of thousands of charters of the Cartae Europae Medii Aevi (CEMA) corpus (CEMA 2023; Perreaux 2021). For example, Perreaux utilizes terms recurring in charters as a socio-historical indicator of intellectual drifts in medieval Europe (Perreaux 2016). Studies with a more specific focus on the identification of formulae include de Valeriola’s application of data-mining methods to the recognition of named entities and recurrent textual patterns of charters, as well as Ostrowski’s work that explicitly aims at extracting traditional diplomatic sections, such as intitulatio, apprecatio, and so forth, in German medieval Königsurkunden, an approach already experimented with, albeit with a considerably smaller scale, by Galuščáková and Neužilová (de Valeriola 2020; Ostrowski 2021; Galuščáková and Neužilová 2018). On the other hand, the stylometric approach of Leclercq and Kestemont disentangles charters’ multiple authorial strata and, to that end, also examines the scribes’ differing formula uses (Leclercq and Kestemont 2021).

2. Data

§ 7 The research data comes from charters written between AD 714 and 996 in historical Tuscia, a territory that mainly corresponds to modern Tuscany. Tuscian charters form abundant time series and are largely accessible in digital format and in good editions (see below). Only originals and coeval copies are taken into account, not charters copied in later centuries. Early medieval Italian charters are dated and their scribes known, which enables sophisticated philological and socio-historical corpus-based research settings (e.g., Korkiakangas 2023). AD 714 is the date of the first document surviving as an original from Tuscia, while the backline comes from the author’s recent digital edition of selected tenth-century charters from the Luccan episcopal archives (LLCT3, Korkiakangas 2021). The following text collections were utilized:

  • Late Latin Charter Treebank 1 (LLCT1): 519 Tuscian charters from AD 714 to 869

  • Late Latin Charter Treebank 2 (LLCT2): 520 Tuscian charters from AD 774 to 897

  • Late Latin Charter Treebank 3 (LLCT3): 72 Luccan charters from the 10th century

  • Chartae Latinae Antiquiores 2, vol. 58, 61–63 (ChLA 2): 117 charters of the 9th century from Pisa, Volterra, and San Salvatore di Monte Amiata

  • Codex diplomaticus Amiatinus, vol. 1 (CDA): 19 charters of the 10th century from San Salvatore di Monte Amiata

  • Carte dell’Archivio arcivescovile di Pisa, fondo arcivescovile, vol. 1 (AAP): 35 charters of the 10th century from Pisa

§ 8 In total, 1,024 occurrences (156 occurrences of constat and 868 of manifestus), distributed across 1,006 individual charters, were detected in the text collections above. These occurrences form the research corpus to be analyzed in the present study. The Late Latin Charter Treebanks 1–3, which are in digital treebank format, correspond to 93% of the occurrences, while the occurrences of the other three collections were extracted from printed volumes following the procedure described in section 4.

§ 9 Almost all the charters that survive from Lucca until AD 897 are included in LLCT1–2, whereas LLCT3 is a sample of approximately 10% of surviving Luccan charters between AD 900 and 1000. The full coverage of charters surviving from Tuscia outside Lucca ends with the Chartae Latinae Antiquiores in AD 900. From the tenth century, only the charters of San Salvatore di Monte Amiata and of the Archiepiscopal archives of Pisa are included in the corpus. This is because, from these locations, charters also survive from the previous centuries; as stated in section 1, there are more and more charters available from other Tuscian localities from the tenth century, but they are not included because of the very belatedness of their chronological distribution.

§ 10 LLCT1–2 are based on the following digitized copyright-free editions: Codice diplomatico longobardo, vol. I–II, ed. Luigi Schiaparelli (Roma: Istituto storico italiano, 1929–1934): charters from all Tuscia until AD 774; Memorie e documenti per servire all’istoria del Ducato di Lucca, tomo V, parte II, ed. Domenico Barsocchini (Lucca 1837): Tuscian (mainly Luccan) charters from Luccan archives from AD 775–897; Memorie e documenti per servire all’istoria del Ducato di Lucca, tomo IV, parte II, ed. Domenico Bertini (Lucca 1836): Tuscian (mainly Luccan) charters from Luccan archives from AD 801–897; Codice diplomatico toscano, tomo II, parte I, ed. Filippo Brunetti (Lucca 1833): Tuscian charters from outside Lucca from AD 774–813. All the texts drawn from these partly outdated editions were revised on the readings of the copyrighted Chartae Latinae Antiquiores volumes, when building the LLCT corpora. The ChLA numbering is used for the charters of LLCT1–3 as well.

§ 11 The curve on Figure 1 and Figure 2 presents the absolute numbers of charters underlying the research corpus across the decades. The bars indicate the percentages of the constat and manifestus clauses in all the charters of each decade. For example, from the 760s, 60 charters are included in the corpus, of which 20% contain a constat clause and 47% a manifestus clause (and the remaining 33% none of them). Although the total numbers of charters vary considerably from decade to decade, the constat and manifestus clauses are a prominent way of opening the dispositive part of a charter from the very beginning, with their aggregate share ranging from 50% to 100% in all charters per decade (the average is 77%). Moreover, the constat clause shows a clearly decreasing chronological trend and the manifestus clause a clearly increasing trend. Consequently, the two formulae are liable to be a fruitful case to be examined in detail with quantitative methods.

Figure 1
Figure 1

The proportion of constat clauses in all charters per decade (bars) and the absolute numbers of all charters per decade (curve).

Figure 2
Figure 2

The proportion of manifestus clauses in all charters per decade (bars) and the absolute numbers of all charters per decade (curve).

§ 12 On the other hand, the present corpus by no means claims to be representative of the early medieval Tuscian documentary production. It merely reflects what survives to date. The survival of historical sources is always fortuitous and may totally mislead interpretations. This risk also applies to conclusions drawn on early medieval charters: given that constat and manifestus are tightly connected to document types (sections 5.1. and 5.2.), the distribution of constat and manifestus is likely to reflect, in part, which types of documents were considered worth keeping in the archives. The economically and often even politically important charta libellaria type lease contracts (usually with manifestus) may have been kept more carefully than some less relevant document types, like ordinations, which only involved the in-house economy of the diocese (Witt 2012, 63) and which, on the other hand, happen to have made more use of constat. Nevertheless, the clear chronological trends – the decrease of constat and the expansion of manifestus – seen in Figure 1 and Figure 2 suggest that such a bias does not change the big picture of diachronic change.

§ 13 In the present study, charters are classified into five main document types on the basis of close reading. In addition, three less frequent subtypes are distinguished under sales and lease contracts (marked with bullets):

  1. donations (N = 158)

  2. exchange contracts (N = 104)

  3. sales contracts (N = 205), including:

    • dispensations (N = 32): pious donations post obitum in which the testator designates executors, dispensatores, who donate the property of the deceased to a given church (or sell it and donate the selling price)

  4. lease contracts (N = 142): lease contracts with no mention of the word libellus (originally “petition”) or libellario nomine/ordine “in terms of libellus,” chronologically preceding chartae libellariae, including:

    • ordination contracts (N = 40): lease contracts in which a priest is ordained into a church and is obliged to pay rent to the cathedral

    • repromissions (N = 10): lease contracts where the recipient promises (repromitto “to promise”) to take care of the property leased to him

  5. chartae libellariae (N = 415): lease contracts which explicitly mention the word libellus or the libellario nomine/ordine condition

§ 14 In the analyses of section 5 and section 6, the infrequent subtypes will be occasionally collapsed into the respective main class, especially with constat clauses: dispensations are treated together with sales contracts and ordination contracts and repromissions with leases. Figure 3 presents the chronological distribution of the five main document types in the corpus. Donations, sales contracts, and traditional leases gradually give way to chartae libellariae, which become the predominant document type in the ninth century, particularly in Lucca (Ghignoli 2009, 2–11).

Figure 3
Figure 3

The numbers of charters containing constat or manifestus clauses by document type per decade.

3. The history of constat and manifestus clauses

§ 15 By constat and manifestus clauses, the present study understands sentences which contain those precise lexical elements, and which typically open the dispositive part of the charter. Disposition (dispositio) is a charter’s declarative part, which conveys the case-specific contents of the legal act. As Figure 1 and Figure 2 indicated, constat and manifestus clauses are frequent, but they are not the only opening clauses. Ideo or ideoque “therefore” typically open the disposition in southern Italian and some Tuscian charters, while praevidi ut “I provided that” is often attested at Monte Amiata, but they are beyond the scope of the present paper.

§ 16 The sentence in (1) in section 1 is the earliest occurrence of the constat clause in the present research corpus. Its wording consists of elements that are already attested in the legal and administrative jargon of the Roman Empire and in the sixth-century Ravenna papyri (Tjäder 1982, 9–10; Ghignoli and Bougard 2011, 255). The formula in (1) presents an impersonal construction consta(t) me “it is certain/evident for me,” followed by the name(s) and title(s) of the author and a temporal expression hac (here hanc) die “today.” After that follow two dispositive verbs, i.e., the semantic core of the transaction, which are represented both by infinitive perfects dependent on consta(t) and by coordinate clauses: each infinitive, vendidisse (here vendedisset) “to have sold” and tradidisse (here tradedisset) “to have handed over,” is followed by a corresponding coordinated finite clause with an indicative perfect, et vendidi (here vendedi) “and I sold” and et tradidi (here tradedi) “and I handed over.”

§ 17 This seemingly duplicate (infinitive perfect + indicative perfect) formulation is probably due to the fact that the formula was originally meant to express both the vendor’s subjective statement that he or she had received the selling price (the accusative with infinitive construction) and an objective ascertainment of that statement (in the indicative). This is suggested by a formulation in the third person with fatetur “(he/she) affirms,” found in a Ravenna papyrus: quique fatetur se distraxisse, et distraxit, adque tradidisse, et tradidit “and he affirms that he has alienated – and so did he alienate – and to have handed over – and so did he hand over” (Pap. Tjäder 2.29.9, AD 504, Ravenna). The same formulation is found in the late seventh/early eighth-century Formulae Marculfi (e.g., 2.26) and in a Milanese charter from AD 725 (ChLA 1.845).

§ 18 The earliest known attestation of the duplicate construction, although introduced by dixit “(he) said/declared,” is in a Dacian wax tablet from AD 163: dixit se locasse, et locavit, Socrationi Soc<r>atis operas suas “he declared to have leased – and so did he lease – his building sites to Socratio, son of Socrates” (CIL 3.948.9). Thus, the constat clause, with or without the duplicate construction, seems to have been a widespread charter formula during the Late Empire. Although the last occurrence in the present Tuscian corpus is from AD 951, instances, even with duplicate constructions, are still found in northern Italian charters of the late twelfth century and southern Italian charters of the eleventh century, and they are unlikely to be the last ones: for example, constat nos […] manifesti sumus quod accepimus a vobis […] libras tredecim “it is certain for us, we manifest that we received from you […] thirteen pounds [of silver]” (CDLM Lenno 93, AD 1184, Lenno); constat me vendere atque in presentem per fustem trado atque venundavo “it is certain for me to sell, and I presently hand over by the ritual staff and sell” (CDC 10.18, AD 1073, [Canne]). In northern Italian charters, constat is quite frequent beginning from the eighth century.

§ 19 Outside Italy, essentially similar constat clauses are occasionally found in charters written in many other parts of the Carolingian world. In the charters of St. Gall, constat can refer both to the first person, like in costa me dare adque donare “it is certain for me to give and donate” (UASG 1.9, AD 744) and, seemingly in a later stage, to the third person, like in constat eum vindere et vindedit “it is certain for him to sell – and he did sell” (UASG 1.285, AD 826). The few instances that I have found from Neustria and Austrasia are all from the ninth century: for example, constat me […] vendidisse, et ita vendidi, tradidisse et de presenti tradidi “it is certain for me […] to have sold – and so did I sell” (Cartularium 180, AD 806); constat nos tibi vendidisse et ita vendimus “it is certain for us to have sold to you – and so did we sell” (CPLAD 1087, Lorsch).

§ 20 The adjective manifestus “apparent, manifest” is also an old piece of equipment in the Roman juridical toolkit, while the earliest attestations of impersonal manifestum est and personal manifestus sum constructions are found in Latin Bible translations of the third/fourth centuries and may be partly Greek influence: impersonal in Itala Baruch 6.68, personal in Vulg. I Ioh. 2.19 (TLL 8.312.40–42). In the sixth century, such constructions are frequent in Justinian’s Novellae: impersonal, for example, manifestum namque est quia “for it is clear that,” in Novell. 72.5; personal, for example, manifestus est quia is qui fit administrator “it is clear that the person who will become the manager,” in Novell. 72.1 (TLL 8.312.59–62). In the Ravenna papyri, manifestus is only attested once, provided that the restoration is correct: m[anifestum est] (Pap. Tjäder 1.4–5.B1.8, AD 552–575, Ravenna).

§ 21 In early medieval Tuscian charters, manifestus sum/manifestum est introduces a larger variety of phrases and elements than constat. Manifestus sometimes opens lengthy narrative parentheses about previous events that serve as the background for the legal transaction recorded in the charter, but the most common environment is leases, especially chartae libellariae (Ghignoli 2009, 16–17; Ghignoli and Bougard 2011, 283–284). The manifestus clause as such is usually rather terse, as is shown by the sentence in (2) above: the predicate is followed by a complement clause (here quia), an instrumental specification per hanc chartulam “by way of this charter,” and the dispositive verb (here offero “to offer”). The latest known attestation of manifestus (quia) in Luccan charters is in a charta libellaria (RCI 1683) from AD 1193.

§ 22 Contrary to constat, the manifestus clause seems to be predominantly an Italian phenomenon. Outside Tuscia, manifestus is already found a few times in northern Italian charters in the eighth century, often declaring the receival of the selling price: for example, manifesto sum […] qualiter acepesset, secudi et in presentia coram testibus acepi, […] soledus quinque “I manifest […] that I received, as I presently received before witnesses, […] five solidi” (ChLA 1.858, AD 793, Mendrisio). Variants of the clause continue to be frequent at least until the twelfth century in the North, while in the southern parts of the Peninsula, this assurance clause containing manifestus typically appears long after the dispositive part of the charter and is attested from the ninth century until at least the eleventh century, for example, de quo pro constabiliscendam tibi hanc nostra venditionem manifesti sumus quod a presentem recepimus a te […] uno solido de auro “therefore, to conclude this our trade, we manifest that we have presently received from you […] one gold solidus” (CDC 9.30, AD 1067, Lucera).

4. A framework for managing documentary formulae

§ 23 The term text reuse means the meaningful reiteration of text and is typically used in text reuse detection (TRD) within digital literary studies, where allusions to and quotations and paraphrases of other literary works are identified. Text reuse in documentary texts differs from literary reuse in that formulae form a relatively closed lexical repertoire that is reproduced in all the documents of the same document type more or less faithfully. The challenge is to manage this easily detectable intra-formula variation, that is, the tens to thousands of occurrences of a slightly varying formula that appears in virtually each charter of a corpus. This section describes the workflow that is used here to reduce the intra-formula variation of early medieval constat and manifestus clauses into a set of human-readable formula templates (section 6).

§ 24 Since Latin is a highly inflectional language and since the spelling of early medieval charters is highly non-standard, lemmatized data is particularly useful for the management of documentary text reuse in Latin charters. Thus, the point of departure is the lemmatized and morphologically and syntactically annotated LLCT1 and LLCT2 dependency treebanks as nested XML. The PML Tree Query search engine in the TrEd Treebank Editor was used to extract the sentences that contained the test case lemmas, i.e., consto and manifestus, which are exemplified by the syntax trees of Figure 4. Its trees are TrEd visualizations of the nodes, node labels, and dependency relations between the nodes based on the treebank annotation. However, LLCT3, the Chartae Latinae Antiquiores, the Codex diplomaticus Amiatinus, and the Carte dell’Archivio arcivescovile di Pisa, which constitute ca. 7% of the data, are so far unlemmatized and unannotated; of them, only LLCT3 is available as an electronic text. From these collections, the author had to transcribe the constat and manifestus clauses manually and lemmatized them with a simple find-and-replace script trained on the LLCT1–2 data.

Figure 4
Figure 4

Typical syntax trees with the lemmas consto (ChLA 1.741, AD 765) and manifestus (ChLA 1.1027, AD 772) visualized by TrEd Treebank Editor.

§ 25 Once all the sentences containing instances of consto and manifestus were identified, the Sketch Engine concordancer was used to create KWIC concordances for the lemmas consto and manifestus. Sketch Engine is an open-access online corpus manager and text analysis software (Sketch Engine 2023). For each occurrence of consto and manifestus, the author counted the frequencies of single lemmas or collocates of lemmas occurring within the same syntax tree, i.e., within the same sentence. The definition of these collocates was based on the three-grams produced by the Sketch Engine n-gram tool, but the approach involved an essentially supervised element based on the author’s linguistic knowledge: excluding lemmas with less than three (with consto) or ten (with manifestus) occurrences, the author selected a set of 32 lemmas/lemma collocates co-occurring with the lemma consto and a set of 39 lemmas/lemma collocates co-occurring with the lemma manifestus that were linguistically meaningful. As a result, these sets consist of the most frequent lemmas/lemma collocates co-occurring with constat and manifestus within the same sentences, as well as some less frequent but essential lemmas/lemma collocates that the author included because they were synonyms or antonyms of certain higher-frequency items (e.g., vendo and venumdo in the constat cluster 2, see section 6.1.).

§ 26 In this study, these lemmas/lemma collocates are considered elements of the constat and manifestus clauses. Figure 4 shows two dependency syntax trees with typical environments of constat and manifestus (for the manifestus clause, cf. the sentence in [2]). Most co-occurring elements occupy systematically one fixed position within the formula, while others can occur in two or three alternative positions. Therefore, each element’s absolute position from the beginning of the sentence was measured. These position numbers defined the most typical linear order of the elements. The information on the presence or absence of each element in the sentence was turned into binary variables: each formula element constitutes a variable which receives value 1 if a sentence contains that element and 0 if it does not. This operationalization enables statistical analyses which relate the distribution of elements with metadata, i.e., time, place, and document type. See the datasets with variables in Appendix 1.

§ 27 Cluster analysis makes it possible to detect co-variation patterns in large datasets by reducing their complexity into a few clusters of similar cases. These clusters can then be cross-tabulated with context variables to visualize their relation to the real world. Cluster analysis is a general term for unsupervised classification analyses which classify cases that are maximally similar with each other into clusters that, in turn, are maximally dissimilar with each other. The non-hierarchical Two-Step Cluster analysis procedure was run in the SPSS 27 statistical software platform on the binary element variables explained above, separately for constat and manifestus (IBM 2023). The Two-Step Cluster analysis is a hybrid approach which first uses a distance measure (here Log-likelihood) to separate groups and then a probabilistic approach to choose the optimal subgroup model. It has been shown to perform consistently better than traditional hierarchical cluster techniques in terms of the number of subgroups detected, classification probability of individuals to subgroups, and reproducibility of findings, as long as all the variables are of the same type, as is the case here (Benassi et al. 2020; Bacher et al. 2004). With constat (N = 156), a solution of three clusters, as proposed by the Two-Step procedure, and, with manifestus (N = 868), a solution of six clusters was chosen. These seemed to be best interpretable, although with manifestus, the Two-Step procedure proposed a solution of seven clusters, multiplying the number of charta libellaria clusters into three (currently clusters 5 and 6; see section 6.2.). Both the cluster solutions show a moderate Silhouette measure of cohesion and separation, provided by SPSS (approximately 0.4). Table 1 and Table 2 present the distributions of the element variables across the clusters.

Table 1

Distributions of the 32 constat element variables across clusters.

Constat clusters (N = 156) Lemma(s) Cluster 1 Cluster 2 Cluster 3
cluster size 71 46 39
unde unde 0 0 2
ideo ideo 12 0 8
in Dei nomine in deus nomen 4 0 0
Deo auctore deus auctor 0 0 3
ego/nos ego/nos 69 46 36
quia quia 1 20 11
et quia et quia 0 6 4
qualiter qualiter 2 0 5
manifestus/-m sum/est manifestus sum 0 0 4
(a) praesenti die – preverbal (ab) praesens dies 0 21 2
(a) hac die (ab) hic dies 41 0 3
praesenti praesens 0 4 0
per hanc chartulam – preverbal per hic chartula 0 32 0
per hanc/praesentem paginam per hic/praesens pagina 0 1 2
per hunc libellum per hic libellus 0 0 3
benigna voluntate etc. benignus voluntas etc. 2 1 0
suadente etc. suadeo etc. 3 0 0
bono animo bonus animus 2 1 0
(in) libera potestate (in) liber potestas 14 0 0
vendidisse et vendidi vendo et vendo 69 0 0
tradidisse et tradidi trado et trado 25 0 0
firmasse et firmavi firmo et firmo 0 0 3
secutis clause sequor 2 0 0
de presenti – interverbal de praesens 5 0 0
vendidi vendo 3 39 0
tradidi trado 3 44 7
dedi do 2 1 9
venumdavi venumdo 0 4 1
suscepi suscipio 0 0 3
firmavi firmo 0 0 5
suppletum suppleo 0 0 5
de praesenti – postverbal de praesens 5 0 0
praevideo praevideo 0 14 2
Table 2

Distributions of the 39 manifestus element variables across clusters.

Manifestus clusters (N = 868) Lemma(s) Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6
cluster size 171 142 91 163 115 186
personal construction with sum 21 131 69 139 104 186
form: manifestu 3 98 62 107 70 157
constat – preverbal consto 3 0 0 0 0 0
ego – preverbal ego 7 1 0 0 0 0
quia – preverbal quia 5 0 0 0 0 0
sum sum 167 142 90 163 115 186
ego/nos ego/nos 161 140 89 163 114 186
(eo) quod (eo) quod 6 2 0 0 0 0
quia quia 156 128 91 163 115 185
qualiter qualiter 3 0 0 0 0 0
ante hos annos ante hic annus 11 5 0 0 0 0
(ante) has dies (ante) hic dies 6 5 0 1 0 0
per chartulam per chartula 24 47 0 118 97 186
per hanc chartulam per hic chartula 59 55 0 23 8 0
propter hanc chartulam propter hic chartula 7 0 0 0 0 0
(per) ha(n)c pagina(m) (per) hic pagina 0 2 0 0 0 0
pro remedio animae X pro remedium anima X 26 9 0 0 0 0
pro animae X remedio pro anima X remedium 10 22 0 0 0 0
propter/pro Dei (omnipotentis) timore et remedio animae propter/pro deus (omnipotens) timor et remedium anima 1 3 0 0 0 0
convenit convenio 7 3 89 4 0 0
dedi do 18 7 0 163 0 184
facere deberemus facio debeo 0 0 90 0 0 0
viganeum/cambium/commutationem viganeum/cambium/commutatio 7 0 91 0 0 1
libellario nomine/ordine libellarius nomen/ordo 0 0 0 121 110 176
ad censum per(ex)solvendum ad census per(ex)solvo 0 0 0 20 3 186
ad laborandum/continendum ad laboro/contineo 2 0 0 76 0 2
firmavi firmo 9 2 0 1 111 0
ordinavi ordino 18 15 0 0 1 0
confirmavi confirmo 20 12 0 0 0 0
offerui offero 60 36 0 8 1 0
vendidi vendo 6 30 0 0 0 0
venumdavi venumdo 5 22 0 0 1 0
tradidi trado 13 29 0 4 0 0
petivi peto 11 0 0 2 0 0
rogavi rogo 10 0 0 0 0 0
repromisi repromitto 1 4 0 0 0 0
construxi/aedificavi/fabricavi construo/aedifico/fabrico 36 6 0 1 0 1
decrevi decerno 2 25 0 0 0 0
constitui/institui constituo/instituo 8 2 0 0 0 0
manum (meam) facio manus (meus) facio 0 3 0 0 0 0
praevideo praevideo 27 1 0 0 2 0

§ 28 In Table 1 and Table 2, the elements are listed in the order they usually appear in charters. The numbers indicate in how many cases each element variable has value 1 in each cluster. The colour accentuates the highest numbers. Section 5 examines the associations of these formula clusters with time, place, and document type, while section 6 presents their element distributions as text reuse templates that help in discerning the main variants of the constat and manifestus clauses as well as their usual wording.

5. Testing the digital diplomatic validity of the framework

5.1. Chronological, geographical, and document type distribution of constat clusters

§ 29 For the clusters to be useful for historical-philological study, they must be meaningfully associated with external factors, here with the distributions of the metadata variables time, place, and document type. Figure 5 visualizes the document type and writing place distributions of the three constat clusters in time, while, in the following section, Figure 6 visualizes the same distributions of the six manifestus clusters. The colour of the marker denotes the cluster membership, while the marker’s shape stands for the document type. Note that, as the curve in Figure 1 and Figure 2 showed, the chronological distribution of charters of the research corpus is uneven. Therefore, a large conglomeration of markers around a certain point of time in Figure 5 and Figure 6 does not necessarily mean that constat or manifestus were proportionally frequent at that time but that more charters survive from that period.

Figure 5
Figure 5

The constat clusters of TwoStep cluster analysis plotted by document type, writing place, and date.

Figure 6
Figure 6

The manifestus clusters of TwoStep cluster analysis plotted by document type, writing place, and date.

§ 30 The patterns of Figure 5 imply slight correlations between the clusters and document type, writing place, and date. It is immediately noticed that constat is particularly typical of sales contracts: indeed, no less than 126 (81%) of all the 156 constat occurrences are found in charters classified as sales contracts. Clusters 1 and 2 are predominant sales contract clusters, while cluster 3 embraces all other document types. Figure 5 (like Figure 6) arranges the writing places roughly by their latitude co-ordinates from north to south within Tuscia. Here, the primary observation is that 58 (37%) of the 156 charters with constat were written in Lucca and that sales contract cluster 2 mainly includes Luccan charters, while the other clusters are more pan-Tuscian, with cluster 1 focusing on the Monte Amiata region, the sphere of influence of the monastery of San Salvatore di Monte Amiata, in south-east Tuscia.

§ 31 There also seems to be a chronological split within sales contracts. Originally, cluster 1 is the main sales contract type in Lucca like elsewhere in Tuscia, but after AD 777, no cluster 1 occurrence is found in Lucca, whereas cluster 2 becomes more and more frequent from the 760s onwards. After AD 820, constat is only used (eight times) in sales contracts in Lucca until its last occurrence in AD 856. Constat is not used in Lucca in other document types than sales contracts after an ordination (lease) in AD 816, while earlier it occurs in them every now and then. Constat continues to occur in Tuscia outside Lucca in the full ninth century and beyond, the last case in the corpus being an exchange at Monte Amiata in AD 951. Indeed, the constat clause seems to be a common opening clause of the disposition for almost any document in southern Tuscia and, as such, it probably represents an older tradition which is still visible in the oldest charters of northern Tuscia but subsequently vanishes. Cluster 3 is not particularly differentiated geographically.

§ 32 Among the 65 charters produced in the Monte Amiata region, only seven occurrences of the later sales contract cluster 2 are found, all between AD 827 and 907, while the older sales contract cluster 1 remains the predominant type right up until the 840s. Although cluster 3 subsumes most of the other document types, even it includes, until AD 819, some sales contracts (the three blue 810s-“outliers” in Lucca in Figure 5; see section 6.1.). In chartae libellariae, constat is only used in the Monte Amiata region (five times, included in cluster 3), while elsewhere chartae libellariae are tightly associated with manifestus. Except for the above-mentioned seven occurrences at Monte Amiata, the later sales contract type of cluster 2 is found outside Lucca four times in the nearby Pisa region, once (and very late) in Prato (north-east of Lucca), and perhaps surprisingly, twice in Maremma, i.e., central maritime Tuscia, Montioni (Suvereto, ChLA 2.73.2, AD 807) and Bibbona (ChLA 2.79.48, AD 850), both charters being preserved in the Episcopal Archives of Lucca. This may be because the Luccan church had possessions in the region and the documents, perhaps, were produced under Luccan influence (Bertini 1972, 26–27).

§ 33 To summarize, the present evidence suggests that the later, simple constat clause of the sales contract cluster 2 is likely to be an innovation introduced in Tuscia by Luccan scribes. Cluster 1 includes older sales contracts all over Tuscia and differs from cluster 2 mainly in its use of the duplicate verbal construction (see section 6.1.). Cluster 3 is a residual class that includes leases, donations, exchanges, and other contracts.

5.2. Chronological, geographical, and document type distribution of manifestus clusters

§ 34 The following graph visualizes the document type and writing place distributions of the six manifestus clusters in time. Note that, due to a smaller geographical variation, a less granular writing place classification is used than with constat.

§ 35 The patterns of Figure 6 again suggest a connection between formula elements, document type, and date, while the connection with a writing place is less varied than with constat: manifestus is markedly connected with Lucca (87%). Manifestus is particularly typical of chartae libellariae (47%), but it is also attested in abundance with donations (17%), lease contracts that are not formally chartae libellariae (15%), and exchanges (11%).

§ 36 Cluster 1 mostly consists of elements that were used indiscriminately in all document types during the early part of the time span under examination, until around the 810s, with some later occurrences especially in donations. The only document type not at all associated with cluster 1 is the chartae libellariae, which only begin to appear from AD 807 onwards. Cluster 2 is also distributed over several document types, except for chartae libellariae and exchanges, and it only includes sporadic exemplars of other leases. Cluster 2 represents a chronologically advanced version of cluster 1, although it also includes some very early occurrences. Clusters 1 and 2 seem to be pan-Tuscian. Cluster 3 is exclusively an exchange cluster, while cluster 4 represents leases and is mainly limited to (early) ninth-century lease contracts, both chartae libellariae and other. Clusters 5 and 6 are predominantly libellary clusters, but they also comprise a few other leases. Cluster 5 is chronologically older than cluster 6. Clusters 3 to 6 are chiefly Luccan, although the later charta libellaria cluster 6 appears to have spread to the Pisa region by the tenth century.

§ 37 In sum, the present manifestus clusters reflect document types rather faithfully (strong Pearson correlation r = 0.78 with the five-class document type classification). Manifestus clusters also reflect diachronic evolution (moderate Pearson correlation r = 0.50). The good cluster/document type match results from the fact that the clustering is determined by frequent elements, among which those that define the document type appear. For example, exchange cluster 3 is heavily determined lexically: the lemma viganeum/cambium/commutatio “exchange” makes it impossible to mistake a charter for anything else than an exchange.

6. Analysis of formula clusters as text reuse templates

§ 38 This section examines the distribution of individual formula elements across clusters in terms of text reuse templates. When the most typical elements of the constat and manifestus clauses are listed in their typical running order, the result is a template of those formulae. This section seeks to define three constat templates and six manifestus templates on the basis of the clusters introduced in the previous sections. For the sake of clarity, the most infrequent elements will be omitted in the templates (Figures 7 to 15) unless they turn out particularly important. The percentages of each element show how often that element occurs within the cluster. Note that placeholders for names and titles of the contracting parties (marked in capitals) appear in the templates, although they have no influence on the underlying cluster analysis. The objects of transaction are not displayed, as they embrace a wide variety of media of exchange, ranging from landed property to money to privileges.

Figure 7
Figure 7

Constat cluster 1 template: older sales contracts all over Tuscia, duplicate constructions (N = 71).

§ 39 Although the clustering and the calculations are realized on lemmas, inflected forms are given in the templates to ensure their readability. Moreover, the Latin of the templates is orthographically and morphologically normalized and all the elements that agree in number and/or in gender are reduced to respective singular forms: plural verbs in the singular and all the pronouns in the masculine singular form. The elements that are semantically equal and/or mutually replaceable are wrapped in boxes or linked with each other using connector lines.

6.1. Constat templates

§ 40 Constat cluster 1 covers the years AD 720 to 874, with the median of AD 788. The most central feature of the template (see Figure 7) is the duplicate verbal construction vendidisse et vendidi, which is virtually the prerequisite for cluster membership (69 times, 97%). In 24 cases (35%), vendidisse et vendidi is accompanied by another duplicate construction, tradidisse et tradidi, thus following the age-old schema presented in section 3. In the remaining 45 cases, vendidisse et vendidi is either followed by no other verb or by the simple (et) vendidi (2 times), (et) tradidi (3 times), or (et) dedi (2 times). The temporal specification (a) hac die is very frequent (58%) and is placed before the verbs, while an alternative temporal specification de presenti “at the moment” may be between the latter part of the duplicate construction (vendidisse et de presenti vendidi) or after the verbs, both with a 7% share. The “double” duplicate construction is an early feature: the majority of its occurrences (63%) date from the four first decades of the corpus (720s to 750s), while there are only three occurrences in the ninth century, mostly in the Monte Amiata sphere. In Lucca, the “double” duplicate construction as well as the duplicate construction in general is abandoned early, the last case dating from AD 777.

§ 41 Cluster 1 also includes charters with formulae that express the vendor’s free will: e.g., benigna nostra uolumtatem “by our well-intentioned will” and nullus aliquis nos suadentes “with us being persuaded by no-one,” both from ChLA 1.802 (AD 730, Pisa), bono animus “by good will/soul” in ChLA 1.732 (AD 738, Chiusi). As Ghignoli and Bougard underline, one would not expect to find declarations of free will in sales contracts (Ghignoli and Bougard 2011, 288). They are, indeed, likely to have derived from donations, in which they appear in the Ravenna papyri (Pap. Tjäder 1.21.13–14: prono animo et spontanea voluntate, nullo cogente neque conpellente sed meae propriae deliberationis arbitrio) and in Tuscian charters (e.g., ChLA 1.934, AD 752, Valdottavo, Lucca). However, a similar declaration is already attested in a sales contract in Formulae Marculfi 2.20, which suggests that its use was a widespread phenomenon and not particularly bound to a specific document type.

§ 42 The formula (in) libera potestate appears in 14 sales contracts of the Monte Amiata region between AD 787 and 838, and is thus exclusively a southern Tuscian feature. All but one of these expressions of free will, which are clearly more frequent outside Lucca, are in charters grouped under cluster 1. Almost all of them appear to be contaminations or combinations of formulae which perhaps had been used as distinctly defined blocks at some earlier point in time. A particularly rich example is the passage from a sales contract from Pisa in (3):

(3) constant me Sunduald, vir honestus, hac dies arvitrium bonem volumtatis, nullus dominus interveniente neque aliquis me suadente, nisi bono animus meus, vindedisse et vindedi, tradedisse et tradedi tivi Filicausi medietatem de casa meas (ChLA 1.799, AD 720, Pisa)

“it is certain for me, Sunduald, vir honestus, today, by deliberation of good will, [undergoing] no interference of any authority nor persuasion by anyone, but by my well-intentioned soul, to have sold – and I did sell – and to have handed over – and I did hand over – to you, Filicausus, one half of my farmhouse”

§ 43 Apart from expressions of free will, the constat clauses of two sales contracts, written by a scribe called Altipert/Altipertu in Volterra and Massa Marittima in the 740s, incorporate a secutis clause, which alludes to the presence of witnesses who are to subscribe to the document: secutis in presentia testibus qui subter presente chartula rouoraturi sunt “being accompanied in person by the witnesses who will sign the present charter below” (ChLA 1.925, AD 746, Massa Marittima).

§ 44 Constat cluster 2 constitutes a simple template which chiefly contains sales contracts from Lucca (70%) and elsewhere and covers the years AD 746–907, with the median of AD 807 (see Figure 8). Contrary to the infinitive perfect of the duplicate construction in cluster 1, here the verbal complement of constat is typically a finite verb linked either by way of an anacoluthon (4) or introduced by the subordinating conjunction quia or et quia (5).

(4) constat me Pertiperto, abitatore in loco Asilacto, filius quondam Tachiperti, presenti die per hanc cartulam vindo et trado … (ChLA 2.81.6, AD 856, Lucca)

“it is certain for me, Pertiperto, resident in Asilacto, son of the late Tachipertus, on the present day, by way of this charter, I sell and hand over …”

(5) costat me Iohannes, filio bone memorie Petroni, quia vendedit adque tradedit tibi Ansighisi … (ChLA 2.86.44, AD 896, Prato)

“it is certain for me, Iohannes, son of Petrus of happy memory, that I sold and handed over to you Ansighisus …”

Figure 8
Figure 8

Constat cluster 2 template: later sales contracts mainly from Lucca, no duplicate constructions (N = 46).

§ 45 Quia is used from AD 776 onwards, while et quia is used six times in the Monte Amiata region between AD 827 and 907. However, quia/et quia never fully ousts the older, anacoluthic, construction. The dispositive verbs have three subtypes: 1) the simple verb pair vendo et trado in the first person (19 times, 41%), attested under the entire life span of cluster 1 (AD 746–907), 2) infinitives governed by verbs denoting the evidential aspect, i.e., vendere et tradere videor (10 times, 22%), attested between the 760s and 820s, and 3) vendere et tradere praevideo (14 times, 30%), only attested between the 770s and 800s. The constructions with videor and praevideo “to seem” are attested almost exclusively in Lucca. The practically synonymous verbs vendo and venumdo were merged for the present discussion.

§ 46 The instrumental specification per hanc chartulam (70%) is an intrinsic part of the template, as it does not occur in clusters 1 or 3. The same can be stated, albeit to a lesser degree, about the temporal specifications (a) presenti die “from the present day” and praesenti “presently,” with a joint relative frequency of 54%. Per hanc chartulam links cluster 2 to the manifestus clause, where the same specification, together with per chartulam, is common. Thus, it looks like this instrumental specification was introduced in northern Tuscia under the time span examined; it is not attested in the Monte Amiata region, not even with manifestus. The first occurrence of per hanc chartulam is from AD 773, while a rare synonym, per hanc paginam, is found in AD 768 (both occur with manifestus from AD 766 onwards). Thus, the onset of the instrumental specification coincides by and large with the emergence of the praevideo construction – and with several other changes in formula use.

§ 47 In cluster 2, the dispositive verb is usually in the present tense but seven times in the past tense, four of which being in the Monte Amiata region between the 820s and the 870s. This distinguishes cluster 2 from cluster 1, in which the verbs are always in the past tense. The chronological change in the tense use between the older cluster 1 and the later cluster 2 may reflect change in how the physical act of writing of the charter was understood in respect to the legal act: whether the act was considered to take place in terms of writing the charter (present tense) or whether it had taken place before it was recorded in the charter (past tense).

§ 48 Cluster 3 is the smallest constat cluster (see Figure 9). It can be characterized as a residual class because, in addition to 12 (31%) sales contracts, i.e., the predominant document type of the constat clause, it includes all those elements that do not belong to sales contracts proper. As was seen in Figure 5, cluster 3 also includes various lease contracts with 16 exemplars (44%, including 5 charta libellaria leases), donations with 7 (18%), and exchanges with 4 exemplars (10%).

Figure 9
Figure 9

Constat cluster 3 template: leases, donations, exchanges, and stray formulae (N = 39).

§ 49 Cluster 3 includes a few duplicate constructions with verbs, such as repromitto “to promise,” suscipio “to receive,” firmo “to confirm,” dono “to donate,” and concedo “to concede,” but not in sales contracts, which fall in cluster 1 with their vendidisse et vendidi. The duplicate constructions with dono and concedo are related to donations, but donations may also have simple dispositive verbs, such as dedi (23%) and tradidi (18%). Dedi and tradidi are also used in other document types than donations.

§ 50 Cluster 3 cases with duplicate construction are relatively early, all from the eighth century, except for ChLA 2.63.11 from AD 866 (Chiusi). However, the main subordination strategy is the conjunctions meaning “that”: quia, sometimes preceded by et, and qualiter (51% together). In cluster 3, qualiter is strongly associated with the Monte Amiata region and et quia with Pistoia and Pisa. The duplicate repromisisse et repromisi occurs in two lease contracts where the recipient promises to take care of the property leased to him, in (6) even at the expense of his personal freedom. These cases with older duplicate constructions are often preceded by causal adverbs or phrases, such as ideo “therefore” and Deo auctore “by God’s authority.” For the rest, constat cluster 3 only contains a few specifying phrases that are frequent enough to be visualized in the template: the temporal specifications (a) hac die (8%) and (a) praesenti die (5%) are the prominent ones.

(6) ideo Deo auturem cunstat me Aunefrid, vir venerabilis, clirico, ac die repromisse et repromisi me servire ad beato sancto Laurentio et sancti Valentini, amturi [= emptori] meo, cum homnia ris mea (ChLA 1.897, AD 720, Lucca)

“therefore, by God’s authority, it is certain for me, Aunefrid, vir venerabilis, clerk, to have promised today – and I did promise – that I shall serve the [church of] blessed St. Laurence and St. Valentine, my purchaser, with all my property”

§ 51 The verbs firmo and confirmo appear mostly in land lease contracts, which are organized either as chartae libellariae (7) or as traditional leases without the mention of libellary condition (8). Although chartae libellariae are introduced by manifestus in northern Tuscia (see below), they are opened by constat (five cases between AD 816 and 866) within the Monte Amiata sphere, where manifestus is scarcely used.

(7) constat me Auduald abbas, rector et deserviens monasterio Domini Salvatori sito monte Amiate … per hunc livello confirmo in vos Lupo et Suaipert … in casa et res et sorte predicto monasterio (ChLA 2.61.28, AD 818, Palia, Monte Amiata region)

“it is certain for me, Auduald, abbot, rector and servant of the monastery of Lord Savior at Monte Amiata … by this libellus, I confirm to you, Lupo and Suaipert … in a house and property and portion of the afore-mentioned monastery”

(8) consta me Guntifridi, virum devotum, filio quondam Tati exercitalis Clusine civitatis, hac die firmasse et firmavi te Auderado, filio Querini, in medietate de casa (ChLA 1.747, AD 772, Roselle, Grosseto)

“it is certain for me, Guntifridi, vir devotus, son of the late Tatus, officer of the city of Chiusi, to have confirmed today – and I did confirm – you, Auderado, son of Querino, in half a house”

§ 52 In the sales contracts of cluster 3, contrary to all other cases seen thus far, constat either opens an assurance clause, or it occurs twice, first to head the dispositive verb vendidi (including venumdavi, 5 times) and later to open an assurance clause that assures that the vendor has received the selling price (Ghignoli and Bougard 2011, 290). This is most typically expressed by constat me suppletum esse “it is certain for me to be satisfied” (13%) (9) or with the verb suscipio “to receive,” in which constat is often preceded by a linking adverb, such as et “and” or unde “thus.” The single verb suscipio (8%) corresponds to the last three cluster 3 occurrences classifiable as sales contracts (see Figure 5). They are written in Lucca in the 810s by the scribes Altifonsus (ChLA 2.74.4, in AD 813) and Rumualdus (ChLA 2.74.38 and ChLA 2.74.42, both in AD 819); they all contain an assurance clause related to a trade contract which involves a pledge and could, therefore, possibly be described as leases as well. For these reasons, and because cluster 3 also includes a few general, only slightly formulaic, narrative sentences introduced by constat that serve as the background to the disposition proper and do not mention the other party (10), the addressee name, in this cluster preceded by tibi “to you” or a te “from you,” is only mentioned in 70% of the cases. The sentence in (10) is an arenga that records the religious-ethical motivation of the pious donation, introduced by constat. In 10% of the cluster 3 cases, a charter also includes a pleonastic manifestus sum/manifestum est element. All the sentences with both manifestus and constat are lengthy narratives of events preceding the actual transaction or arenga-like descriptions of its ethical motivation, like in the anacoluthic sentence in (11).

(9) unde constat me qui supra Vuillerad clericum suscipisset et suscipi ad te Crispino pro ipsa suprascripta terra pretium placitum et in defenitum capitulu auris soledos nomero vigenti tantum (ChLA 1.933, AD 752, Lucca)

“thus, it is certain for me, the above-mentioned Vuillerad, clerk, to have received – and I did receive – from you, Crispino, for the above-mentioned land, the fixed and in legal terms predetermined price of twenty gold solidi in total”

(10) constat me Arnipert clericus avitator in Feronianu, qualiter credo superna speratio in me evenisse, ut seculi fragilitas relinquere et Dei omnipotentis me subdere (ChLA 1.762, AD 790, Laucinianu, Monte Amiata region)

“it is certain for me, Arnipert, clerk, resident in Feronianu, that I believe that divine inspiration has fallen on me in order that I abandon the flimsiness of the world and submit myself to Almighty God”

(11) ideoque consta me Walderam filius quondam Silperadi de Cosuna manifestum est seo quod tu Rado filius meus mihi in mea senecta multa erga <me> inpendere visus est (ChLA 1.756, AD 777, Cosona, Siena)

“therefore, it is certain for me, Walderam, son of the late Silperadus of Cosuna, it is manifest and that you, Rado, my son, seem to have expended much for me in my old age”

§ 53 Cluster 3 covers the entire time span of constat clauses from AD 720 to AD 951 but becomes clearly less frequent towards the end, as indicated by the median of AD 788. This can be taken to reflect the general unification and standardization of documentary formulae in the ninth century, also in terms of spelling and grammar: the variation present in cluster 3 was increasingly canalized into a few surviving document types with an increasingly fixed element repertoire. As Figure 5 showed, cluster 3 as a whole is not particularly determined geographically, although single formulae belonging to it can be markedly local.

6.2. Manifestus templates

§ 54 With the manifestus clause, the author of the charter, from whose viewpoint the wording is always composed in Tuscian charters, can either receive or deliver the object of transaction. Consequently, the dispositive verbs and their pronominal subjects can be either in the first or the second person. In addition, and independently of the viewpoint, the manifestus construction may be personal (manifestus sum) or impersonal (manifestum est). For example, with the verb do “to give” (see clusters 4 and 6), the alternatives are

manifestus sum ego/manifestum est mihi NN1 quia tu NN2 dedisti mihi (NN1),

“it is manifest to me, NN1, that you, NN2, gave me (NN1)”

manifestus sum ego/manifestum est mihi NN1 quia (ego NN1) dedi tibi NN2,

“it is manifest to me, NN1, that I (NN1) gave to you, NN2”

where NN1 stands for the author of the charter, in the first person, and NN2, the addressee, i.e., the other party, in the second person.

§ 55 The typical locations of the author/addressee data are again marked in the templates. Whether the commissioner passively receives or actively delivers the object of transaction can be important for diplomatic or historical studies. However, this distinction is not encoded in the present templates, which are only based on lemma clustering. Thus, the manifestus templates do not give the percentages of addressee names, except for cluster 3, while the author name is always present with the manifestus sum/manifestum est part of the formula. Instead, the templates visualize two variables not included in the cluster analysis: a morphosyntactic variable that indicates whether the construction is used personally or impersonally, and a morphological variable that indicates whether the actual form with personal constructions is manifestu instead of manifestus/-um, a feature which shows systematic variation in time and place. This information is easily derived from the treebank annotation for the LLCT1–2 instances.

§ 56 Manifestus cluster 1 constitutes a general-purpose template which contains various elements and can be used with most document types (see Figure 10). The cluster covers the years AD 728–919. However, 87% of the occurrences are prior to AD 810 and the median is AD 788. Cluster 1 includes donations (58%), leases (14%), ordination contracts (12%), sales contracts (6%), dispensations (4%), and a few eighth-century exchanges (4%). The exchanges are directly linked to the 4% of convenit “it was convenient”/“it was agreed” and (in) viganeum/cambium/commutationem “in exchange,” the distinctive elements of exchange contracts. Only chartae libellariae are not associated with cluster 1.

Figure 10
Figure 10

Manifestus cluster 1 template: older multi-purpose pan-Tuscian formula, mostly donations, impersonal manifestum est (N = 171).

§ 57 Cluster 1 has the lowest share of the personal manifestus sum construction (12%) of all the manifestus clusters; the impersonal manifestum est is used in 88% of cases. The form manifestu (est) is rarely used, likely to avoid a collision of word-final and -initial vowels. The main subordination strategy is the conjunction quia (91%), but the few manifestus clauses with (eo) quod “that” (4%) and qualiter (2%) are also grouped under cluster 1. The instrumental specifications per/propter (hanc) chartulam are quite frequent (together 53%), but not as frequent as with most other clusters.

§ 58 Cluster 1 is not clearly centred around one or two specific dispositive verbs but makes use of a wide variety of different verbs. The temporal specifications ante hos annos/dies “before these years/days” (together 10%) ensue from the fact that cluster 1 contains several manifestus clauses that introduce lengthy narrative descriptions of the circumstances of the transaction. For example, someone has previously constructed (construo/aedifico/fabrico (21%), constituo/instituo (5%)) a church or a monastery, to which he/she subsequently donates some property (e.g., offero (35%), do (11%)) or ordains a priest (ordino (11%), sometimes firmo/confirmo (together 16%)) (see the sentence in (12)). Likewise, the phrase pro remedio animae NN (15%) or the variant pro animae NN remedio (6%), where NN signifies a possessive pronoun meae “my” or a personal name, introduce the arenga of the donation. Peto (6%) and rogo (6%) (“to ask”) typically introduce a petition by a person who wishes to be ordained priest in a church. Note that the dispositive verbs that occur in such background narratives do not necessarily have anything to do with the transaction carried out in the charter, whose dispositive verb is often expressed by a subsequent formula not introduced by manifestus. Because the actors that appear in the background narratives can be in the 3rd person and the verbs in the past or present tense, the (dispositive) verbs of the cluster 1 template are given as their lemmas; the same applies to cluster 2 as well. However, the most typical form is the 1st-person singular perfect, like in most other clusters.

§ 59 The presence of narratives also makes cluster 1 less clearly associated with specific document types than other clusters. In the donation of (12), the manifestus clause serves to explain the status of the priest Deusdona, who, a couple of sentences later, donates the mentioned church to a certain Alpert, with the dispositive part of the charter introduced by propterea “therefore,” not by manifestus.

(12) manifestum est mihi Deusdona presbitero filio quondam Filicausi, rector ecclesie sancti Angeli da Isgragio, quia quondam Teudoraci presbitero et quondam Alitrodula, Adosia, Teutperga germane et Dei ancilli filie quondam Laudici ante hos plurimos annos confirmavirunt me per dotis titulo in suprascripta ecclesia sancti Michaeli arcangeli (ChLA 1.1072, AD 780, Lucca)

“it is manifest to me, Deusdona, priest, son of the late Filicausus, rector of the church of St. Archangel of Isgragio, that the late Teudoraci, priest, and the late sisters Alitrodula, Adosia, and Teutperga, God’s maidservants, daughters of the late Laudicus, numerous years ago, confirmed me, by way of donation, in the above-mentioned church of St. Michael Archangel”

§ 60 Finally, cluster 1 is the only cluster with many occurrences of praevideo (16%), which is not particularly frequent with manifestus, contrary to constat. The cluster 1 cases of praevideo are almost all from Lucca of the 760s to 780s.

§ 61 Manifestus cluster 2 (see Figure 11) is a chronologically advanced version of cluster 1. Although it also includes early occurrences, especially with donations, 81% of the cases are posterior to AD 810 (cf. cluster 1), with an overall range of AD 737–995 and a median of AD 842. The main feature that distinguishes cluster 2 from cluster 1 is the high percentage (92%) of the personal manifestus sum construction, compared to the 12% of cluster 1. Also, in cluster 2, 69% of the lemma manifestus are in the form manifestu, compared to the 2% of cluster 1. The instrumental specifications per chartulam and per hanc chartulam (together 72%) are more frequent than in cluster 1 (together 49%). They are relatively late features, with the personal construction, the form manifestu, and per chartulam beginning to appear frequently from around AD 810, per hanc chartulam already from the 760s. The relative frequencies of the phrases pro remedio animae NN (6%) and pro animae NN remedio (15%) are reverse compared to cluster 1 (15% and 6%, respectively), which reflects the chronological distribution of the two phrases: the most occurrences of the former are found before AD 825, the most occurrences of the latter after that year (for a third, infrequent, option, pro NN animae remedio, see [13]). The chronological difference between clusters 1 and 2 is essentially bound to the chronological distribution of the above-mentioned elements.

Figure 11
Figure 11

Manifestus cluster 2 template: later multi-purpose pan-Tuscian formula, personal manifestus sum (N = 142).

§ 62 A narrower range of verbs is used in manifestus cluster 2 than in cluster 1. The same verbs are mainly associated with the same document types as in cluster 1. The lower share of verbs of giving, do and offero (together 30%, in comparison to the 46% of cluster 1), reflects a less accentuated presence of donations (35%). The other document types of cluster 2 are sales contracts (25%), dispensations (18%), ordination contracts (10%), and repromissions (4%). Almost all the manifestus dispensations, ordinations, and repromissions belong to clusters 1 and 2. Cluster 2 only contains single exemplars of chartae libellariae and exchanges.

§ 63 The selling verbs vendo (21%), venumdo (15%), and trado (20%) are more frequent than in cluster 1. This is due to frequent sales contracts, but, in part, also to other document types that mention selling although the contract proper is of another type. The same applies to the verb offero, which is also used, for example, in dispensations, like in (13), where the real dispositive verb, offerre (previdemus) “we (intend to) offer,” is found in the sentence following decrevit (decerno “to decree,” 18%).

(13) manifestum est nobis Agiprando presbitero et Amico presbitero seu et Ropperto clerico quia quondam Deusdedit presbiter per cartulam decrevit casam et rem suam in nostra potestatem, ut nos eam pro illius animae remedio dispensare deverimus; propterea per hanc cartulam pro remedio animae iam dicti Deusdedi presbiteri offerre previdemus Deo et tibi ecclesie beati sancti Nazarii … (ChLA 1.1107, AD 787, Lucca)

“it is manifest to us, Agiprando, priest, Amico, priest, and Ropperto, clerk, that the late Deusdedit, priest, decreed by charter his house and property into our possession so that we should dispense of it for the remedy of his soul; therefore, by this charter, we intend to offer, for the remedy of the afore-mentioned Deusdedit, priest, to God and to you, the church of blessed St. Nazarius …”

§ 64 Cluster 3 is the smallest manifestus cluster and the only one that matches almost perfectly with a single document type, i.e., exchange contracts (99%) (see Figure 12). The first occurrence of cluster 3 is from AD 761 and the last one from AD 988, with the median of AD 855. The exchange formula revolves around a simple set of elements with little lexical variation. The formula has two subtypes: the inter me et te “between me and you” variant (14) is early (19%), up to AD 809, except for one later occurrence, while the una tecum “together with you” variant (15) is later (81%), from the ninth and tenth centuries, with some attestations in the eighth century. The construction is usually personal (76%), and the form is manifestu in 68% of the cases, which results from the cluster being relatively late.

(14) manifestum est mihi Rachiprando presbitero rectori ecclesiae sanctae Mariae sitae in Sexto quia convenit inter me et te Baruttulam clericum filium quondam Baruccioli ut cambium inter nos de aliquantula re facere deberimus (ChLA 1.1052, AD 775, Lucca)

“it is manifest to me, Rachiprando, priest, rector of the church of St. Mary located in Sexto, that it was agreed between me and you, Baruttula, clerk, son of the late Barucciolo, that we should make an exchange of some property between us”

(15) manifestu sum ego Hildeprandus, in Dei nomine comis, filio bone memorie Heriprandi, quia convenit mihi una tecum Hieremias, gratia Dei huius sancte Lucane ecclesie humilis episcopus, germano meo, ud inter nos de aliquantis casis et rebus in comutationem facere deberimus, sicut et factum est (ChLA 2.81.38, AD 862, Lucca)

“I, Hildeprandus, count in the name of God, son of Heriprandus of happy memory, manifest that it was agreeable to me, together with you, Hieremias, by God’s grace the humble bishop of this saint Luccan church, my brother, that we should make an exchange of some houses and property between us, as it was also done”

Figure 12
Figure 12

Manifestus cluster 3 template: exchange contracts (N = 91).

§ 65 The convenit clause, always subordinated to manifestus sum/manifestum est by quia, is followed by a complement clause with ut. This latter finally contains the lexeme denoting exchange, commutatio, viganeum, or cambium. Cambium (7%) is the oldest one, used mainly in the 760s to the 770s (14); after that, follows viganeum (18%), which was used at the turn of the eighth and ninth centuries (Bougard 2013). The most frequent term, commutatio (75%), first appears in AD 806. It soon supplants the non-classical viganeum and cambium and continues until the last cluster 3 occurrence in AD 988. This change can be interpreted as aspiration towards more classical vocabulary. The object of exchange, de aliqua/aliquanta/aliquantula re “of some property”, appears in 75% of cases; the generic word res “property” is sometimes replaced by a more specific expression, like in de aliquantis casis et rebus “of some farmhouses and (their) belongings” in (15). In 67% of the cases, the exchange formula is concluded by a summary phrase sicut et factum est “as was also done” (see [15]).

§ 66 Note that seven exchange contracts that do not follow the cluster 3 pattern are included in cluster 1. Except for one ninth-century occurrence, they date from the 760s to 780s, and their dispositive verb is a verb of giving: for example, do in cambium/viganeum “to give in exchange.” Cluster 3 exchange type is predominantly Luccan (90%). Six cases (7%), mainly from the late ninth and early tenth centuries, come from the neighboring Pisa region, which suggests an increasing Luccan influence on Pisan documentary production.

§ 67 The last three manifestus clusters are about lease contracts. Clusters 4 to 6 differ from each other chiefly in terms of their dispositive verbs (dedisti/dedi vs. firmasti/firmavi), usually in the second and sometimes in the first person, and of the final gerund/gerundive clause (ad laborandum/continendum “to be cultivated/maintained” vs. ad censum persolvendum/perexsolvendum “for paying the rent”), which together express the “feudal” bond concluded between the lessor and lessee. Difference is also seen in the ratio of the instrumental specification per chartulam vs. per hanc chartulam and in the frequency of the phrase libellario nomine/ordine. These differences are in part tied to chronological change. Table 3 shows that the manifestus clusters 4 to 6 include almost exclusively lease contracts. In the present study, lease contracts are divided into two groups, chartae libellariae and other leases, based on whether their text contains a reference to their being libelli or not.

Table 3

The document types of the manifestus clusters 4 to 6.

Document type cluster 4 (N = 163) “dedisti ad laborandum” cluster 5 (N = 115) “libellario nomine firmasti” cluster 6 (N = 186) “dedisti ad censum per(ex)solvendum”
charta libellaria 75% 96% 95%
other lease 24% 4% 5%

§ 68 As Table 3 shows, three fourths of cluster 4 are chartae libellariae and one fourth other leases. The first occurrence is from AD 773 and the last from AD 990, but 94% of the occurrences date to the ninth century, with the median of AD 846. The non-libellary leases are somewhat earlier, with the median of AD 827, while the chartae libellariae are later, with the median of AD 853.

§ 69 Cluster 4 (see Figure 13) is characterized by the dispositive verb dedisti/dedi combined with ad laborandum/continendum (47%) or with ad censum per(ex)solvendum (12%), which is particularly frequent in cluster 6. In 5% of the cases, the verb offero is used along with do. The sentence in (16) represents the most typical wording. The personal construction manifestus sum (85%) is not yet as frequent in cluster 4 as it will be in cluster 6 (100%). Likewise, the percentage of libellario nomine/ordine is lower (74%) than in clusters 5 and 6, as is the case with the instrumental specification per chartulam (72%).

(16) manifestu sum ego Bonighisi, filio quondam Teuduli, quia tu Berengarius, gratia Dei humilis episcopus, per cartula livellario ordine a lavorandum et gubernandum seo meliorandum dedisti mihi … (ChLA 2.77.39, AD 840, Lucca)

“I, Bonighisi, son of the late Teudulo, manifest that you, Berengarius, humble bishop by God’s grace, by charter, in terms of a libellus, gave me to be cultivated and governed and ameliorated …”

Figure 13
Figure 13

Manifestus cluster 4 template: chartae libellariae with the phrase dedisti ad laborandum and other lease contracts (N = 163).

§ 70 Cluster 5 predominantly contains chartae libellariae (96%) (see Figure 14). Cluster 5 is roughly contemporary with cluster 4, from which it differs in terms of the dispositive verb, firmo (97%), and the absence of gerund/gerundive clauses. Cluster 5 is, however, chronologically even more concise than cluster 4: almost 97% of the occurrences date from the ninth century (median AD 847), with only one occurrence from the eighth century, from AD 787, and three from the tenth, the last one from AD 911.

Figure 14
Figure 14

Manifestus cluster 5 template: chartae libellariae with the phrase libellario nomine firmasti (N = 115).

§ 71 The proportion of the personal manifestus sum construction (90%) and of the specification per chartulam (84%) is higher than in cluster 4 but lower than in cluster 6. Libellario nomine/ordine (97%) is more frequent than in cluster 4 and of the same frequency with cluster 6. The lessee is again the receiver of the transferred property, hence the prevalence of the second person (17), like in cluster 6 as well.

(17) manifestu sum ego Amalfridi filio quondam Fulprandi quia tu Petrus gratia Dei huius sancte Lucane ecclesie humilis episcopus per cartulam livellario nomine firmasti me id est in casa et res … (MED 1064 in LLCT3, AD 903, Lucca)

“I, Amalfridi, son of the late Fulprando, manifest that you, Petrus, by God’s grace the humble bishop of this saint Luccan church, by charter, in terms of a libellus, confirmed me in the house and property …”

§ 72 Like cluster 5, cluster 6 (see Figure 15) also contains predominantly chartae libellariae (95%). Cluster 6 differs from clusters 4 and 5 in that it employs the gerundive clause ad censum persolvendum/perexsolvendum (100%), and it has the same dispositive verb as cluster 4, dedisti/dedi (99%). Cluster 6 is chronologically the most advanced of all the manifestus clusters: the first occurrence is from AD 830 and the last from AD 999, with the median of AD 885. The cluster is very frequent in Lucca but appears also in the neighboring Pisa region after AD 874 and throughout the tenth century (see Figure 6), which again suggests a Luccan influence on Pisan documentary production.

Figure 15
Figure 15

Manifestus cluster 6 template: chartae libellariae with the phrase dedi ad censum per(ex)solvendum (N = 186).

§ 73 The high frequency of this rather rigid type of charta libellaria with little variation in its template makes it appear as an independent cluster (18). If five clusters were produced instead of six, the present cluster 6 would merge into cluster 4, both with dedisti/dedi. All the charters included in cluster 6 display the personal manifestus sum construction (100%), but almost all other elements also show a percentage of 100% or close to it. Even the specification per chartulam appears in each charter, and libellario nomine/ordine in 95% of them. The cluster also shows the highest share of the form manifestu (84%), a feature that became more and more frequent in time.

(18) manifestu sum ego Cumputo filio bone memorie Lei quia tu Raimbertus episcopus gratia Dei huius sancte Pisane eclesie umilis episcopus per cartula livellario nomine ad censum perexolvendum dedisti mihi … (AAP 69, AD 994, Pisa)

“I, Cumputo, son of Leo of happy memory, manifest that you, Raimbertus, by God’s grace the humble bishop of this saint Pisan church, by charter, in terms of a libellus, gave me for paying the rent …”

6.3. The variation of constat and manifestus clauses and historical change

§ 74 The motivation of developing the above digital diplomatic framework for quantifying intra-formula variation was to serve corpus-based historical-philological research of medieval Latin textual culture. While the historiographical contextualization of the patterns observed in the constat and manifestus templates will be left for other studies, this subsection briefly discusses some patterns of the constat and manifestus clauses that are likely to reflect historical change in document production.

§ 75 Amelotti and Costamagna (1975, 182) interpret Charlemagne’s capitulary (MGH Capit. 1.81.13) of ca. AD 810, which barred priests from charter writing, as an attempt to take the control of private document production in imperial hands. As for Tuscia, Keller highlights that, with the ascension to power of the first Frankish count Boniface I in Lucca in AD 812 or 813, the local clergy was effectively excluded from public administration. Indeed, while most Luccan charters were written by ecclesiastical scribes under the Lombard kingdom (until AD 774) and further until the end of the eighth century, most charters posterior to the 810s are the work of lay notaries (Keller 1973, 120–124). This shift in documentary production is also visible in the spelling and grammatical features of charters, which become more uniform during the ninth century, potentially an achievement of the Carolingian reformatory endeavors (Korkiakangas 2018, 586–587; Korkiakangas 2023, 248).

§ 76 As a whole, Tuscian charter production is professionalized and standardized beginning from the 810s (Witt 2012, 62–64; Mailloux 2008, 21–27), and charter formulae seem to echo this development: for example, the considerable variation present in constat clauses in the eighth century (see the constat cluster 3) is canalized into a few surviving document types with an increasingly fixed element repertoire in the ninth century. Likewise, the decline of the multifaceted multi-purpose manifestus cluster 1 by the early ninth century suggests that, from that time onwards, document types become formally more and more clear-cut: in comparison to the earlier patchwork templates, documentary types focus on a few decisive lexemes and collocates, which come to be their hallmarks. This is also visible in the templates: the later a template is, the more concise it is. The fact that even the spelling and grammar of the charters improve suggests that the overall decrease in variation is not likely to be an optical illusion caused by the increased rate of charter production (Korkiakangas 2023).

§ 77 The replacement of Luccan scribes around AD 812/813 seems to be tied with variation in specific constat and manifestus formula elements: the rise of the personal manifestu(s) sum construction, the precise form manifestu, the instrumental specification per chartulam, and the classical Latin commutatio instead of non-classical viganeum or cambium are to be dated to the 810s, whereas praevideo almost disappears by that decade. On the other hand, the very charta libellaria as a document type seems to be of Frankish origin and heavily increases in number after AD 800; certain formula elements, such as libellario ordine/nomine, are direct symptoms of changing legal conventions, in this case suggesting new practices in making agrarian contracts (Andreolli and Montanari 1983, 85–94; Mailloux 2008, 23–24).

§ 78 Indeed, Korkiakangas observes that the spelling of the charta libellaria type lease contracts is rather faithful to the ancient Latin spelling compared to most other document types (Korkiakangas 2023, 246–247). On this basis, it can be hypothesized that Frankish authorities initially circulated model chartae libellariae written in a reformed spelling in the early ninth century, hence their overall uniformity in comparison with other document types whose textual tradition was centuries-old and consequently more confused. Regardless of whether such model charters existed or not, it is plausible to assume that the smaller number of copy generations intervening between the “original” and the surviving exemplars explains at least part of the relative uniformity of the charta libellaria formula repertoire.

7. Conclusions and future perspectives

§ 79 The present study introduced a framework to quantify and manage documentary formulae in large charter corpora in terms of text reuse templates. It tested the framework on two showcase formulae, the constat and manifestus clauses. An unsupervised statistical classification technique, cluster analysis, was used to reduce the variation of 32 constat and 39 manifestus formula elements within a corpus of 1,283 charters into a few clusters manageable by the human mind: these clusters were identified with subtypes of document formulae. Templates based on clusters proved a viable method of presenting formula variation because they could be easily compared to each other and related to metadata variables to detect what was typical of a certain place and time. Once templates had highlighted the big picture of variation of the two formulae, it also became easier to notice when it was necessary to examine the chronological, geographical, and document type distributions of individual formula elements.

§ 80 Constat clauses were clustered in three cluster templates in terms of document type (sales contract clusters 1 and 2 vs. multi-purpose cluster 3), the duplicate verbal construction, which, for its part, is highly determined chronologically (older sales contract cluster 1 vs. later sales contract cluster 2 and cluster 3), and geography (later sales contract cluster 2 being mostly related to Lucca). The more frequent manifestus clauses were clustered in six cluster templates, which can be further grouped into three broader types based on the dominant document type: multi-purpose clusters 1 and 2 include various document types, while cluster 3 is an exchange cluster and clusters 4 to 6 lease clusters. Clusters 1 and 2 differ from each other in that cluster 1 is older than cluster 2 and, accordingly, the former is characterized by the impersonal manifestum est construction and the latter by the personal manifestus sum. They are also the only clusters that are distributed throughout Tuscia, although the manifestus clause is mostly concentrated in Lucca. Lease clusters 4 to 6, too, are chronologically differentiated (older clusters 4 and 5 vs. later cluster 6). The dispositive verb of cluster 5 is firmo, whereas that of clusters 4 and 6 is do, followed by a final gerund clause ad laborandum in cluster 4 and by ad censum per(ex)solvendum in cluster 6.

§ 81 The distributions of the templates and individual elements can sometimes be potentially associated with specific historical events, as is the case of the 810s administrative reform in Lucca, initially sparked by the Carolingian conquest of Tuscia (section 6.3.). Indeed, a systematic analysis of the variation in a broader range of formulae is likely to shed light on thus far unnoticed historical changes in document production. The approach may, for example, help in sketching a more detailed image of the actual mechanisms underlying the implementation of the Carolingian educational reforms in Italy (Bartoli Langeli 2006, 30).

§ 82 In sum, although quantitative and digital methods cannot replace the traditional approach based on comparison and close reading, they can complement them by illustrating what is common in large datasets and how features pattern in time and place. In the end, the approach sketched here may also promote systematic reconstruction of formulae on a large scale: once all the relevant medieval (Italian) charter collections have been edited and turned into digital format in the future, these methods will make it possible to reconstruct meticulously all the charter formulae and their variants at the desired level of granularity. This will help in drawing panoramic conclusions on how Italian Latin charter formulae – and, more broadly, Latin formulae all over Europe – evolved over centuries and what kind of over-regional influences were at play, thus complementing the immense work done with traditional qualitative historical-comparative methods by scholars such as Tjäder, Zielinski, and Ghignoli. Moreover, large-scale lemmatization and morphological and syntactic parsing of charter data will make it easier to fine-tune classification analyses by taking syntax and morphology better into consideration, for example, the person and the tense of the dispositive verb, features that may have specific chronological and geographical distributions. In the future, it will also be interesting to relate formula templates with the intra-writer variation of individual scribes: Did they adhere to a fixed formula repertoire instead of utilizing similar templates eclectically? Is diachronic change in formula use visible during a scribe’s career?


The constat and manifestus datasets: https://doi.org/10.5281/zenodo.8413246.


The research presented in this paper was funded by the Academy of Finland (grant no. 315176).

Competing interests

The author has no competing interests to declare.

Editorial contributors

Recommending Editors

       James Buffington Harr III, Christian Brothers University, USA

       Gustavo Fernandez Riva, University of Heidelberg, Germany

Recommending Referees

       Dominique Stutzmann, Institut de recherche et d’histoire des textes (IRHT), France

       Nicholas Perreaux, Huygens Institute for the History of the Netherlands, Netherlands

Copy and Layout Editor

       Morgan Pearce, Journal Incubator, University of Lethbridge, Canada


Primary sources

AAP = Carte dell’Archivio arcivescovile di Pisa, Fondo arcivescovile, vol. 1 (720–1100), (Ghignoli 2006).

Cartularium = Cartularium: Recueil des chartes du prieuré de Saint-Bertin, à Poperinghe, de ses dépendances à Bas-Warneton et à Couckelaere, (d’Hoop 1870).

CDA = Codex diplomaticus Amiatinus, erster Band: von den Anfängen bis zum Ende der Nationalkönigsherrschaft (736–951), (Kurze 1974).

ChLA 1 = Chartae Latinae Antiquiores: Facsimile-edition of the Latin Charters Prior to the Ninth Century, (Bruckner et al. 1954–2001).

ChLA 2 = Chartae Latinae Antiquiores: Facsimile-edition of the Latin Charters, 2nd Series: Ninth Century, (Cavallo et al. 1997–2019).

CIL 3 = Corpus Inscriptionum Latinarum, vol. III: Inscriptiones Asiae, provinciarum Europae Graecarum, Illyrici Latinae, (Mommsen 1873).

CDC 9 = Codex diplomaticus Cavensis, vol. IX (1065–1072), (Leone and Vitolo 1984).

CDC 10 = Codex diplomaticus Cavensis, vol. X (1072–1080), (Leone and Vitolo 1990).

CDLM Lenno = Le carte dei monasteri di S. Maria dell’Acquafredda di Lenno e di S. Benedetto in val Perlana (1042–1200), (Pezzola 2011). In Codice diplomatico della Lombardia Medievale (Ansani 2000–2003). https://www.lombardiabeniculturali.it/cdlm/edizioni/co/lenno-smaria/.

CPLAD = Codex Principis olim Laureshamensis Abbatiae Diplomaticus: ex aevo maxime Carolingico diu multumque desideratus, vol. 2, (Lamey 1768).

Formulae Marculfi = Marculfi Formularum libri duo, (Uddholm 1962).

LLCT1 = Late Latin Charter Treebank 1, version 1.2, (Korkiakangas 2020a). https://zenodo.org/record/3633607#.X_NUMbNS9EY.

LLCT2 = Late Latin Charter Treebank 2, version 1.2, (Korkiakangas 2020b). https://zenodo.org/record/3633614#.X_NUYLNS9EZ.

LLCT3 = Late Latin Charter Treebank 3, version 1.0 (Korkiakangas 2021). http://alim.unisi.it/dl/fonti_documentarie (select Carte del fondo Diplomatico dell’Archivio storico diocesano di Lucca).

MED = Memorie e documenti per servire all’istoria del Ducato di Lucca, tomo V, parte III, (Barsocchini 1841).

MGH Capit. 1 = Monumenta Germaniae Historica, legum sectio II: Capitularia regum Francorum, tomus I, (Boretius 1883).

Pap. Tjäder 1 = Die nichtliterarischen Lateinischen Papyri Italiens aus der Zeit 445–700, vol. 1: Papyri 1–28, (Tjäder 1955).

Pap. Tjäder 2 = Die nichtliterarischen Lateinischen Papyri Italiens aus der Zeit 445–700, vol. 2: Papyri 29–59, (Tjäder 1982).

RCI = Regesta Chartarum Italiae: Regesto del Capitolo di Lucca, vol. III, (Guidi and Parenti 1933).

TLL = Thesaurus Linguae Latinae. Leipzig/Berlin: Teubner/De Gruyter, 1900–.

UASG = Urkundenbuch der Abtei Sanct Gallen, Theil 1: Jahr 700–840, (Wartmann 1863).

Secondary sources

Amelotti, Mario, and Giorgio Costamagna. 1975. Alle origini del notariato italiano. Roma: Consiglio nazionale del notariato.

Andreolli, Bruno, and Massimo Montanari. 1983. L’azienda curtense in Italia: proprietà della terra e lavoro contadino nei secoli VIII–XI. Bologna: CLUEB.

Bacher, Johann, Knut Wenzig, and Melanie Vogler. 2004. “SPSS TwoStep Cluster – A First Evaluation.” Arbeits- und Diskussionspapiere 2004(2): 1–25. Lehrstuhl für Soziologie, Sozialwissenschaftliches Institut, Friedrich-Alexander-Universität Erlangen-Nürnberg.

Barsocchini, Domenico, ed. 1841. Memorie e documenti per servire all’istoria del Ducato di Lucca, tomo V, parte III. Lucca: Francesco Bertini.

Bartoli Langeli, Attilio. 2006. Notai: scrivere documenti nell’Italia medievale. Roma: Viella.

Benassi, Mariagrazia, Sara Garofalo, Federica Ambrosini, Rosa Patrizia Sant’Angelo, Roberta Raggini, Giovanni De Paoli, Claudio Ravani, Sara Giovagnoli, Matteo Orsoni, and Giovanni Piraccini. 2020. “Using Two-Step Cluster Analysis and Latent Class Cluster Analysis to Classify the Cognitive Heterogeneity of Cross-Diagnostic Psychiatric Inpatients.” Frontiers in Psychology 10(11). DOI:  http://doi.org/10.3389/fpsyg.2020.01085.

Bertini, Luca. 1972. “Peredeo vescovo di Lucca.” In Studi storici in onore di Ottorino Bertolini, vol. 1, 21–45. Pisa: Pacini.

Boretius, Alfred, ed. 1883. Monumenta Germaniae Historica, legum sectio II: Capitularia regum Francorum, tomus I. Hannoverae: Hahn.

Bougard, François. 2013. “Commutatio, cambium, viganeum, vicariatio: L’échange dans l’Italie des VIIIe–XIe siècles.” In Tauschgeschäft und Tauschurkunde vom 8. bis zum 12. Jahrhundert/L’acte d’échange, du VIIIe au XIIe siècle, edited by Irmgard Fees and Philippe Depreux, 65–98. (Archiv für Diplomatik, Schriftgeschichte, Siegel- und Wappenkunde, Beiheft 13.) Köln: Böhlau. DOI:  http://doi.org/10.7788/boehlau.9783412211608.65.

Bruckner, Albert et al., eds. 1954–2001. Chartae Latinae Antiquiores: Facsimile-edition of the Latin Charters Prior to the Ninth Century. Olten: Urs Graf Verlag.

Cavallo, Guglielmo et al., eds. 1997–2019. Chartae Latinae Antiquiores: Facsimile-edition of the Latin Charters, 2nd Series: Ninth Century. Dietikon: Urs Graf Verlag.

CEMA (Cartae Europae Medii Aevi). 2023. “Cartae Europae Medii Aevi.” Accessed May 25, 2023. https://cema.lamop.fr.

Cerquiglini, Bernard. 1989. Éloge de la variante: Histoire critique de la philologie. Paris: Seuil.

DEEDS (Documents of Early England Data Set). 2023. “Documents of Early England Data Set.” Accessed May 25, 2023. https://deeds.library.utoronto.ca.

D’Hoop, Felix Henri, ed. 1870. Cartularium: Recueil des chartes du prieuré de Saint-Bertin, à Poperinghe, de ses dépendances à Bas-Warneton et à Couckelaere. Bruges: Vandecasteele-Werbrouck.

De Valeriola, Sébastien. 2020. “L’ordinateur au service du dépouillement de sources historiques.” Histoire & mesure 35: 171–196. DOI:  http://doi.org/10.4000/histoiremesure.13534.

Fichtenau, Heinrich. 1957. Arenga: Spätantike und Mittelalter im Spiegel von Urkundenformeln. Graz, Köln: Böhlau.

Fichtenau, Heinrich. 1977. Beiträge zur Mediävistik, zweiter Band: Urkundenforschung. Stuttgart: Anton Hiersemann.

Galuščáková, Petra, and Lucie Neužilová. 2018. “Low Resource Methods for Medieval Document Sections Analysis.” In LREC 2018: Eleventh International Conference on Language Resources and Evaluation, May 7–12, 2018, Miyazaki, Japan, edited by Nicoletta Calzolari, Khalid Choukri, Christopher Cieri et al., 2344–2348. ELRA. Accessed May 25, 2023. http://www.lrec-conf.org/proceedings/lrec2018/summaries/999.html.

Gervers, Michael. 1997. “The Dating of Medieval English Private Charters of the Twelfth and Thirteenth Centuries.” In A Distinct Voice: Medieval Studies in Honor of Leonard E. Boyle, edited by Jacqueline Brown and William Stoneman, 455–480. Notre Dame, IN: University of Notre Dame Press.

Gervers, Michael, Gelila Tilahun, Shima Koshraftar, Roderick Mitchell, and Ariella Elema. 2018. “The Dating of Undated Medieval Charters.” ARCHIVES: The Journal of the British Records Association 53(2): 1–33. DOI:  http://doi.org/10.3828/archives.2018.7.

Ghignoli, Antonella, ed. 2006. Carte dell’Archivio arcivescovile di Pisa, Fondo arcivescovile, vol. 1 (720–1100). Pisa: Pacini.

Ghignoli, Antonella, ed. 2007. “Repromissionis pagina: pratiche di documentazione a Pisa nel secolo XI.” Scrineum Rivista 4(4): 37–107. DOI:  http://doi.org/10.13128/Scrineum-12112.

Ghignoli, Antonella, ed. 2009. “Libellario nomine: rileggendo i documenti pisani dei secoli VIII–X.” Bullettino dell’Istituto storico italiano per il medio evo (111): 1–62.

Ghignoli, Antonella, and François Bougard. 2011. “Elementi romani nei documenti longobardi?” In L’héritage byzantin en Italie (VIIIe-XIIe siècle), tome I: la fabrique documentaire, edited by Jean-Marie Martin, Annick Peters-Custot, and Vivien Prigent, 241–301. Roma: École française de Rome.

Guidi, Pietro, and Oreste Parenti, eds. 1933. Regesta Chartarum Italiae: Regesto del Capitolo di Lucca, vol. III. Roma: Istituto storico italiano.

IBM. 2023. “TwoStep Cluster Analysis.” Accessed May 25, 2023. https://www.ibm.com/docs/en/spss-statistics/25.0.0?topic=features-twostep-cluster-analysis.

Keller, Hagen. 1973. “La marca di Tuscia fino all’anno Mille.” In Atti del V Congresso internazionale di studi sull’alto medioevo, 117–136. Spoleto: Centro italiano di studi sull’alto medioevo.

Korkiakangas, Timo. 2018. “Spelling Variation in Historical Text Corpora: The Case of Early Medieval Documentary Latin.” Digital Scholarship in the Humanities 33(3): 575–591. DOI:  http://doi.org/10.1093/llc/fqx061.

Korkiakangas, Timo. ed. 2020a. Late Latin Charter Treebank 1, version 1.2. Accessed April 24, 2023. https://zenodo.org/record/3633607#.X_NUMbNS9EY.

Korkiakangas, Timo. ed. 2020b. Late Latin Charter Treebank 2, version 1.2. Accessed April 24, 2023. https://zenodo.org/record/3633614#.X_NUYLNS9EZ.

Korkiakangas, Timo. ed. 2021. Late Latin Charter Treebank 3, version 1.0. TEI XML edition in the Archivio della Latinità Italiana del Medioevo (ALIM). Accessed April 24, 2023. http://alim.unisi.it/dl/fonti_documentarie. (Note: select Carte del fondo Diplomatico dell’Archivio storico diocesano di Lucca).

Korkiakangas, Timo. 2022. “From Memory or Formulary: How Were Medieval Documentary Formulae Reproduced?” Mirator 22(1): 4–24. DOI:  http://doi.org/10.54334/mirator.v22i1.119760.

Korkiakangas, Timo. 2023. “Spelling Correctness as a Witness of Changing Documentary Culture in Tuscia (eighth–ninth centuries).” Early Medieval Europe 31(2): 220–251. DOI:  http://doi.org/10.1111/emed.12619.

Kurze, Wilhelm, ed. 1974. Codex diplomaticus Amiatinus, erster Band: von den Anfängen bis zum Ende der Nationalkönigsherrschaft (736–951). Tübingen: Max Niemeyer Verlag.

Lamey, Andreas, ed. 1768. Codex Principis olim Laureshamensis Abbatiae Diplomaticus: ex aevo maxime Carolingico diu multumque desideratus, vol. 2. Mannhemii: Typis Academiciis.

Leclercq, Eveline, and Mike Kestemont. 2021. “Advances in Distant Diplomatics: A Stylometric Approach to Medieval Charters.” Interfaces: A Journal of Medieval European Literatures 8: 214–244. DOI:  http://doi.org/10.54103/interfaces-08-10.

Mailloux, Anne. 2008. “L’émergence du notariat à Lucques (VIIIe–Xe siècle): normes et pratiques d’un corps professionnel.” In Le notaire: entre métier et espace public en Europe VIIIe–XVIIIe siècle, edited by Lucien Faggion, Anne Mailloux, and Laure Verdon, 13–27. Aix-en-Provence: Publications de l’Université de Provence. DOI:  http://doi.org/10.4000/books.pup.7283.

McKitterick, Rosamond. 1989. The Carolingians and the Written Word. Cambridge, UK: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511583599.

Mommsen, Theodor, ed. 1873. Corpus Inscriptionum Latinarum, vol. III: Inscriptiones Asiae, provinciarum Europae Graecarum, Illyrici Latinae. Berolini: apud G. Reimerum.

Ostrowski, Alina. 2021. “Automatische Erkennung und Klassifikation von Formularbestandteilen in Königsurkunden: Zur Aufbereitung digitaler Urkundenkorpora in der Mediävistik.” In Die Historischen Grundwissenschaften heute: Tradition – methodische Vielfalt – Neuorientierung, edited by Étienne Doublier, Daniela Schulz, and Dominik Trump, 139–166. Wien: Böhlau. DOI:  http://doi.org/10.7788/9783412520663.139.

Perreaux, Nicolas. 2016. “L’écriture du monde II: L’écriture comme facteur de régionalisation et de spiritualisation du mundus: études lexicales et sémantiques.” Bulletin du centre d’études médiévales d’Auxerre (BUCEMA) 20(1): 1–37. DOI:  http://doi.org/10.4000/cem.14452.

Perreaux, Nicolas. 2021. “Possibilities, Challenges and Limits of a European Charters Corpus (Cartae Europae Medii Aevi – CEMA).” Sciences de l’Homme et de la Société. Accessed May 25, 2023. https://hal.science/hal-03203029.

Pezzola, Rita, ed. 2011. “Le carte dei monasteri di S. Maria dell’Acquafredda di Lenno e di S. Benedetto in val Perlana (1042–1200).” In Codice diplomatico della Lombardia Medievale, edited by Michele Ansani. Accessed April 24, 2023. https://www.lombardiabeniculturali.it/cdlm/edizioni/co/lenno-smaria/.

Schwarzmaier, Hans-Martin. 1972. Lucca und das Reich bis zum Ende des 11. Jahrhunderts: Studien zur Sozialstruktur einer Herzogstadt in der Toscana. Berlin, Boston: Max Niemeyer Verlag. DOI:  http://doi.org/10.1515/9783111678078.

Sketch Engine. 2023. “Sketch Engine.” Accessed May 25. https://www.sketchengine.eu.

Tjäder, Jan-Olof, ed. 1955. Die nichtliterarischen Lateinischen Papyri Italiens aus der Zeit 445–700, vol. 1: Papyri 1–28. Lund: C.W.K. Gleerup.

Tjäder, Jan-Olof. ed. 1982. Die nichtliterarischen Lateinischen Papyri Italiens aus der Zeit 445–700, vol. 2: Papyri 29–59. Lund: Åströms Förlag.

Todros, Gabriela. 2010. “Brevi considerazioni sul patrimonio archivistico dell’Arcidiocesi lucchese.” In Il patrimonio documentario della chiesa di Lucca: prospettive di ricerca, edited by Sergio Pagano and Pierantonio Piatti, xi–xiv. Firenze: Galluzzo.

Uddholm, Alf, ed. 1962. Marculfi Formularum libri duo. Vpsaliae: Eranos’ förlag.

Wartmann, Hermann, ed. 1863. Urkundenbuch der Abtei Sanct Gallen, Theil 1: Jahr 700–840. Zürich: Antiquarische Gesellschaft in Zürich.

Witt, Ronald G. 2012. The Two Latin Cultures: the Foundations of Renaissance Humanism in Medieval Italy. New York: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511779299.

Zielinski, Herbert. 1972. Studien zu den spoletinischen “Privaturkunden” des 8. Jahrhunderts und ihrer Überlieferung im Regestum Farfense. Tübingen: Niemeyer. DOI:  http://doi.org/10.1515/9783110955187.