1. Introduction

§1  Recent years have seen a widespread application of stylometric analysis for the purpose of authorial attribution in medieval and early modern texts (Grzybek 2014; Kestemont 2012; Binongo 2003; Van Dalen-Oskam and van Zundert 2007; Love 2002; Souvay and Pierrel 2009; Koppel et al. 2009; Juola 2008). However, authorship can also be a question of degree. When texts arise out of processes of rewriting and adaptation, the dividing line between what is “original” authorship and what is primarily translation or adaptation can become blurred. Several major literary works from the late medieval period, including Thomas Malory’s Morte Darthur and Geoffrey Chaucer’s Canterbury Tales are largely adaptations of earlier texts and therefore provide a useful framework for applying quantitative stylistic analysis to the process of source adaptation (Kelly 1978, 291; Kestemont 2012; Kestemont 2015). In particular, the linguistically varied source texts late medieval adaptors worked with constitute a potential wealth of information concerning the genesis of the linguistic and stylistic features shaping the literary texts at hand (Mairey 2006; Norris 2008; Davidson 2008). In this respect, the re-writing and adaptation of source texts allows for greater insights into the act of literary creation than can be afforded by the black box of “original” literary works.

§2  Applying stylometry to medieval texts presents a specific set of challenges and opportunities. On the one hand, the pre-processing required for digitally analyzing these texts is laborious because the corpus needs to be normalized in order to remedy the spelling variation that is characteristic of the period (Piotrowski 2012; Kestemont et al. 2010). On the other hand, many late medieval texts are the product of idiosyncratic processes of source adaptation, as well as collaborations between authors and scribal amanuenses, which have been of particular interest to scholars in the wake of the New Philology (Thompson 2016, 14; Benedict 2004, viiiix; Kestemont et al. 2015). These more complex forms of authorship present a promising field for further developing and testing methods of stylometric analysis (Reynolds et al. 2012). Mike Kestemont et al.’s study of the twelfth-century Latin works of Hildegard von Bingen and her amanuensis Guibert of Gembloux demonstrates how the stylometric techniques employed for authorial attribution studies can be adapted to the task of distinguishing between different authorial styles at work in collaborative medieval texts (Kestemont et al. 2015). Their methodology and results are also highly significant for the study of source adaptation, where the stylistic features of the source texts intersect with the stylistic features of the adaptation.

§3  In the following we will present the results of a basic stylometric analysis of the language used in the different sections of Thomas Malory’s fifteenth-century Arthurian collection Morte Darthur (Morte) and discuss how our findings contribute to a number of scholarly debates concerning the Morte.

§4  As an interlinked collection of eight “tales” or “books” based on a range of source texts, the Morte exemplifies the variety and scope of late medieval adaptations (Norris 2008, 164; Archibald 2013, 176). The current state of the art regarding which sources Malory used in which sections of his work has been defined by Ralph Norris’ seminal monograph on Malory’s Library (Norris 2008), which devotes a chapter to each section, or “book”, of Malory’s Arthuriad and discusses in detail which sections of which sources Malory uses where. Norris identifies eight major source texts, two in Middle English and six in Old French, which were used for different sections of the Morte (Norris 2008, 164, see Table 1). He synthesizes and expands upon seven decades of preceding research concerning Malory’s sources (Simko 1957; Bennett and Oakeshott 1963; Lumiansky 1964; Batt 2002), and his findings can be regarded as resembling “a complete list of Malory’s sources” (Norris 2008, 163).

Table 1

Sections of the Morte Darthur listed alongside their major sources as identified by Norris (2008, 163–4). For the purposes of this study, the Morte Darthur has been divided into eight books in accordance with the findings of Eugène Vinaver (1947), each corresponding to a “tale”. The differing chapter subdivisions contained in William Caxton’s edition of the Morte Darthur are given for information only. The “source language” column describes the major sources only.

Book Tale Caxton Chapter Source Language Major Source

I Arthur 1–4 Old French Prose Merlin, Post-Vulgate Suite du Merlin
II Roman War 5 Middle English Alliterative Morte Arthure
III Lancelot 6 Old French Prose Lancelot
IV Gareth 7 ? ?
V Tristram 8–12 Old French Prose Tristan
VI Sankgreal 13–17 Old French Vulgate Queste del Saint Graal
VII Lancelot & Guinevere 18–19 Middle English/Old French Stanzaic Morte Arthur/ Vulgate La Mort Artu
VIII Death of Arthur 20–21 Middle English Stanzaic Morte Arthur

§5  While the overall question concerning which sources Malory used has largely been answered, medievalists have continued to focus on four questions that remain unresolved. In the following we will outline the state of the art concerning each of these open questions.

§6  One such question concerns the source Malory used for his Book IV, “The Tale of Sir Gareth”, which is the only tale for which neither Norris nor any other Malorians have convincingly identified a major source. Norris suggests that the source was a now-lost romance written in Old French (Norris 2008, 159), meanwhile D. Thomas Hanks Jr suggests an oral folktale source (Hanks Jr 2003, 52) and P.J.C. Field has posited that Malory was working with a lost Middle English romance (Field 1995, 255). Arnold Sanders, on the other hand, argues that there was no lost major source and that Malory’s tale represents an original reworking of the Gawain romances and the trope of the “Fair Unknown” (Sanders 2006, 34). The controversy concerning whether or not Malory worked with a major source for this section of his work and, if so, what the genre and language of that source were remains a point of particular interest to Malory scholars.

§7  A second and more fundamental unresolved question regarding the Morte concerns the differences between the two surviving versions of the text: that contained in the fifteenth-century Winchester manuscript and that contained in William Caxton’s 1485 print edition. The question of how these two versions are related to each other first arose following W.F. Oakeshott’s discovery of the Winchester manuscript in 1936 and Eugène Vinaver’s revelation that the manuscript version differed significantly from the Caxton print edition, which had hitherto been the only surviving version of the Morte (Gordon and Vinaver 1937; Norris 2006, 68). Vinaver claims that the Winchester manuscript presents the Arthurian story in eight separate sections, referred to as “tales” or “books” (Gordon and Vinaver 1937). Based on this evidence, Vinaver concludes that Malory did not intend the Morte to be read as one work and that his text was instead a collection of eight separate tales (see Vinaver’s edition of The Works of Sir Thomas Malory, 1947). This “unity” debate remains unresolved, and, while several recent editions have combined different aspects of the Winchester and Caxton versions and present the Morte as one work (Field 2013; Shepherd 2003), the degree to which the tales are unified remains a point of debate (Clark 2014, 92–3).

§8  The discovery of the Winchester manuscript raised a third controversial issue concerning the genesis of the Morte. Vinaver draws particular attention to Book II of the Morte, known as the “Roman War” section, as the part of the text where the greatest differences between the Winchester and Caxton versions of the story can be found (Gordon and Vinaver 1937, 81). Following the discovery of the Winchester Manuscript, Jan Simko produced a parallel-text analysis of the version of the “Roman War” episode contained in the Winchester manuscript, that contained in the Caxton edition, and the Alliterative Morte Arthure, which is the major source of the “Roman War” section (Simko 1957). He concludes that differences in word order between the Caxton and the Winchester versions were dictated by functional, grammatical, rhythmic and stylistic factors (Simko 1957, vii). Simko also argues that the differences between the two versions should be attributed to Caxton’s editorial intervention (Simko 1957, ix). This position has been opposed by William Matthews, who draws on internal stylistic evidence from the Caxton and Winchester versions of the “Roman War” episode to argue that Malory himself made the revisions to his text that appear in the Caxton edition (Matthews 1997; Moorman 1995, 25). There is currently no scholarly consensus on who was responsible for the differences between the two surviving versions of the “Roman War” episode (Moorman 1995, 24–5).

§9  One of the reasons that scholars have been so interested in the “Roman War” episode is because of the possibility that the two different versions represent different stages of Malory’s revision process (Matthews 1997; Moorman 1995, 28–9). If it were possible to establish that Malory was himself responsible for both versions of this section, researchers would gain new insights into Malory’s creative process as a rewriter.

§10  This issue feeds into the fourth and final open question regarding the Morte. Scholars have disagreed on Malory’s role in creating the Morte, and in particular on where he should be placed on the continuum between “faithful translator” and “original author” (Lewis 1963; Lynch 2006; Davidson 2008). This debate has also hinged on the question of the extent to which Malory re-shapes the linguistic and stylistic features of his source texts in ways that go beyond straightforward translation. Malory was long regarded as a simplistic translator who “has no style of his own” (Lewis 1963, 23). However, since the 1990s there has been a growing interest in Malory’s written style and reshaping of his narrative material. These arguments have been reviewed in Andrew Lynch’s article on “A Tale of ‘Simple’ Malory” (Lynch 2006). Thus, for example, Jeremy Smith argues that Malory’s paratactic composition of the text is “intensely audience-centred” and indicative of stylistic experimentation (Smith 1996, 104). Similarly, Ingrid Tieken-Boon van Ostade, Shunichi Noguchi, and Toshiyuki Takamiya have undertaken targeted analyses of specific linguistic and stylistic features of Malory’s writing (Tieken-Boon van Ostade 1995; Noguchi 1995; Takamiya 1993). More recently, Roberta Davidson has drawn on Catherine Batt to argue for a view of Malory’s adaptation process that gives due recognition to his role at the interface between being a “translator as writer” and a “translator as reader” (Batt 1989, 143–7; Davidson 2008, 133–4).

§11  Overall, scholarship on the Morte has so far failed to reach a consensus on the four questions discussed above. On the one hand, there have been numerous studies of the Morte, which have focused on small scale qualitative study of individual sections of the work in order to determine Malory’s treatment of his sources in individual tales. On the other hand, larger scale studies of Malory’s language have focused on analyzing linguistic and stylistic usage throughout the Morte without fully taking into account the role of Malory’s many different source texts in shaping his writing.

§12  What these studies leave open is the question of the extent to which the eight sections of Malory’s Morte differ from each other linguistically and how these differences may reflect the linguistic and stylistic features of the corresponding Middle English and Old French source texts he was working with. Our study is based on the hypothesis that there is linguistic variation between the different sections of the Morte and that these differences reflect the influence of the various source texts Malory was working with for each of his tales. It has been undertaken on the understanding that identifying these influences has the potential to shed light on several unresolved questions concerning the overall unity of the Morte, Malory’s adaptation process, the possible source used for Malory’s “Tale of Sir Gareth” and the genesis of the two surviving versions of Malory’s “Roman War” episode.

2. Methods

§13  Our first step was to download Caxton’s edition of the Morte from the Corpus of Middle English Text and Verse hosted by the University of Michigan (http://name.umdl.umich.edu/MaloryWks2). We then removed all HTML tags, notes and chapter titles and merged the 21 “chapters” into eight “books” (see Table 1). These books are aligned with the divisions in the Winchester manuscript identified by Vinaver as separating Malory’s work into eight books (Gordon and Vinaver 1937passim). We normalized the text to unify spelling variants (Table 2), adopting the procedure outlined by Kestemont et al. (2015). Thus all non-standard Latin characters were replaced (e.g. “þ” with “th”) and “i/j/y” were treated as the same letter, as were “u/v/w”. Double consonants were replaced with single consonants. The raw and the normalized corpus were made publicly available under the document identifier DOI: 10.5281/zenodo.2639708.

Table 2

Lemmatization. Letter patterns were substituted with the replacements indicated. “{}” indicates a zero-length character string. Furthermore, every word was stemmed by shortening it to four letters.

Pattern Replacement

-e, -es, -est, -eth {}
i,j y
u,w v
-oo o

§14  After normalizing the spelling, we applied three tokenization methods, which we term “simple stemming”, “Porter like stemming” and “dictionary lemmatization”. Our “simple stemming” involved removing suffixes such as “-e” and “-es” (with a few exceptions, such as “yes”) and harmonizing the spelling of a small number of high frequency words. Finally, we shortened all tokens to a maximum of four letters. We then manually inspected our list of approximately 200 high-frequency words and carried out further tokenization where necessary, for example “hit” was always categorized as representing the same token as “it”. Some words representing content rather than function were removed to prevent plot features biasing the statistical analysis. Thus, we excluded proper names (“Arthur”, “Launcelot”) and nouns and adjectives referring to the main protagonists (“quen”, “fayr”) from our further analysis. The full list of excluded words is given in the caption of Table 3. Following this process of manual curation, we were left with a list of 135 words, which we have used as the basis for our further analysis (Table 3).

Table 3

High frequency one-word tokens in the Morte. Words were shortened to four letters, suffixes such as “e” and “es” were removed and spelling was simplified (see Table 2), e.g. the letters “i”, “j” and “y” are all represented as “y”. Only words that occurred reproducibly in 80% of the samples were selected. The following words, which we regard as being indicative of content, were removed from the list: “syr”, “knyg”, “kyng”, “lavn”, “arth”, “gava”, “lady”, “qven”, “lord”, “fayr”, “nobl”, “damo”, “bors”.


a afte agey al am an and anon as at
bata be ben both brod bvt by cam cast com
covn covr day ded depa do don dovn dyd ever
for from god good gret had hand hath hav he
her hors hov hym hys knov let lov mad mak
man many may me men moch mor my mygh nam
neve no non not nov nygh of on ony or
othe over ovt pass pray rod rygh sav say sayd
see self she shal shel shol slay smot so sper
stro svch sver tak the that them then ther they
thov thr thvs thy thys to told took tvo tym
vas ve vel vent ver vet vhan vhat vher vhyc
vnto vold vors vpon vs vyl vyth yes ye yf
yn yov yovr ys yt

§15  For comparison we also undertook “Porter-like stemming”, where we applied the same work-flow as for “simple stemming”, however, we also implemented many of the rules of the Porter stemmer: thus suffixes such as “-ly” and “-yng” were removed in addition to the ones listed in Table 2. Finally, we tested a form of “dictionary lemmatization”: we created a dictionary for all high-frequency words and manually assigned them to a lemma, for example “hors”, “horses”, “horsbak”, “horsed”, “horsback”, “horse”, “horsemen”, “horsbere”, “horsfeet”, and “horseman” were all assigned to the lemma “hors”. The final dictionary contained 160 lemmata. The overall results of our “Porter-like stemming” and our “dictionary lemmatization” can be viewed in the supplementary material (Supplementary Figure S1 and S2). The results of the “dictionary lemmatization” were also used for the MDS bi-plot showing the word loadings in Figure 5.

§16  Our next step was to undertake statistical analysis of two different feature frequency tables: one in which the text was tokenized as 1-word tokens and one in which the lemmatized text was tokenized as consecutive 2-word tokens. After tokenization, 20 independent random samples were drawn by sampling with replacement, of 5,000 or 75,00 tokens for 1-word and 2-word tokens respectively (this process of sampling is discussed by Eder 2015). As the text overall contains fewer high-frequency 2-word tokens, we had to collect more tokens per sample in this dataset, in order to ensure the numerical stability of the analysis.

§17  We retrieved high-frequency tokens by “culling” (Hoover 2004a, 2004b) with a document frequency cutoff of 80% (i.e. by removing all words that occur in less than 80% of all samples). Drawing on the culled frequency table, we then calculated Burrows’ Delta distance (Burrows 2002) between the samples and carried out multi-dimensional scaling (MDS). We repeated the analysis with two alternative distance measures: the Cosine Delta distance (Jannidis et al. 2015) and the Euclidean distance. (It is worth noting that “classical” MDS, based on singular value decomposition and using the Euclidean distance, is equivalent to performing principal component analysis (PCA)). In addition to 1-word and 2-word tokens, we also collected samples of character 3, 4 and 5-grams of normalized but unstemmed words (Stamatatos 2006). After culling with a 95% document frequency cutoff, we performed MDS using the same methodology as before. All computation was undertaken in R (R-Core Team 2018) making heavy use of functions provided in the stylo package in R (Eder et al. 2016). We employed the package plot3D (Soetaert 2017) for 3D plotting and the package car (Fox and Weisberg 2011) for plotting ellipses using the function dataEllipse at confidence intervals of 0.5 and 0.95. We carried out MDS using the function cmdscale. MDS eigenvalues were used to calculate the proportion of variance represented in each MDS dimension. Word loadings of an MDS dimension were calculated using Pearson’s correlation coefficient.

3. Results

§18  The MDS plot of our results, shown in Figure 1, strongly supports our hypothesis that there is linguistic variation between the different sections of the Morte and that this can be attributed in part to the influence of the different source texts Malory was working with for his different tales. This is confirmed by the affinity our graphs show between the text sections that are based on major English source texts (Books VII and VIII, marked in light and dark blue respectively). The samples drawn from these sections cluster in the upper right-hand quadrant of Figure 1. Meanwhile, the samples drawn from those sections based primarily on major French source texts (Books I, III, V, and VI, marked in orange, red, yellow, and pink respectively) cluster in the lower right hand quadrant of the MDS graph. We further observe that the samples from Book III (“Lancelot”) lie at one extreme, and the samples from Books VII and VIII lie at the other extreme of the second MDS dimension (MDS2, corresponding to the vertical axis in Figure 1). Since Malory’s major source texts are French for the former and English for the latter, it seems feasible to conclude that MDS2 captures stylistic features that reflect the linguistic influence of the language of the source texts. This is confirmed by the fact that the samples from Books I, V and VI, which are also known to be based on French source texts are all positioned closer to samples from Book III in MDS2. It is also particularly striking that the samples drawn from the “Roman War” section (Book II, marked in green), which has attracted so much scholarly attention, also appear as an anomaly in our analysis and stand out from all other sections by clustering in the left-hand half of Figure 1.

Figure 1 

Multi-Dimensional Scaling (MDS) scatterplot of high frequency words in the Morte. Each filled circle represents a sample of 5,000 tokens (obtained by “simple stemming”, see ‘Methods’), the colour indicating which of the eight “books” (Table 1) it was taken from. The ellipses are drawn at 50 and 95% confidence intervals calculated from the 20 samples that were drawn from each “book”. The MDS was calculated from Burrows’ Delta distances between samples. The two first dimensions of the MDS (MDS1 and MDS2) are shown. The proportion of variance represented in each dimension is given in brackets.

§19  The separation of the “Roman War” section is visible in the first MDS dimension (MDS1, horizontal axis) of Figure 1, not the second (MDS2, vertical axis). The first dimension of an MDS represents the features that constitute the largest degree of dissimilarity between the samples (i.e. the difference between the books of the Morte assessed by token frequencies). On a hermeneutic level the features represented in MDS1 should be regarded as the most salient features of the dataset. As Figure 1 shows, this MDS dimension draws a clear dividing line between the Roman War and all the other books of the Morte, giving a strong indication that this book does not map onto a continuous scale of authorial style of the other books. With respect to the specific types of word frequencies that are represented in the first dimension of the MDS all the other books are more or less indistinguishable (Figure 1).

§20  While Books I and III–VIII all show values of between 0.2 and 0.4 on the horizontal axis (MDS1), the “Roman War” section in Book II is centred at 0.9. We applied a standard statistical test (T-test) to investigate how likely it is that Book II belongs to the same population (i.e. has the same style, based on MDS1) as all the other books, and obtained a very clear-cut answer in that, in all probability, it does not belong to the same population (p < 1e–15). By contrast, the differences between the other books of the Morte are located entirely in the second dimension of the MDS shown in Figure 1 and must therefore originate from different types of stylistic features than those that set the “Roman War” section apart from all the other books.

§21  The first two dimensions of the MDS appear to represent two almost entirely independent effects, which have a signal that is far above the noise level contained in the data: the first effect might be regarded as that of third-party involvement in the writing of the “Roman War” section and the second effect is that of whether the respective section is based on French or English source texts. Higher MDS dimensions, such as the third MDS dimension, shown as the z axis in Figure 2, represent somewhat less important, or statistically less pronounced, features. These higher dimensions reveal a tendency to further separate the individual books. In MDS3, the “Tale of the Sankgreal” (Book VI) is separated from Books I, III and IV; in even higher MDS dimensions up to MDS7 the differences between Books I, III and IV are revealed (Supplementary Figure S3). The eighth MDS dimension is the first one to show no distinguishable signature anymore.

Figure 2 

Three-dimensional representation of the first three dimensions of a multi-dimensional scaling of one-word tokens in the Morte. The MDS is the same as shown in Figure 1.

§22  At this stage it should be noted that MDS, like PCA is an unsupervised technique, that does not take class labels (i.e. which book a sample originates from) into account. Thus, in principle, the MDS analysis is blind and even-handed with regard to our classification of the different sections of the work into books and the different source-text languages involved and only represents the distances between samples. As we have sampled with replacement, there is a possibility of the degree to which the samples taken from the same book cluster together being slightly overstated in the MDS—especially for the shorter “books”. Despite this caveat, the overall class separation revealed in the MDS remains reliable, as it is representative of actual differences between the classes with respect to the underlying data. This makes it all the more significant that the different MDS dimensions reveal a consistent and increasingly Fine-grained degree of distinction between all of the different books.

§23  Moreover, we also obtained very similar results when we used two-word tokens and character 3, 4 or 5-grams instead of one-word tokens (compare Figure 1 to Figures 3 and 4 and Supplementary Figures S1 and S2) or applied different Distance measures (Supplementary Figures S1 and S2). This offers further evidence that the linguistic signal we have detected in the MDS is strong and reliable.

Figure 3 

Multi-dimensional scaling scatterplot of two-word tokens in the Morte. Each filled circle represents a sample of 7,500 two-word tokens (obtained by “simple stemming”). The two first dimensions of the MDS are shown, the colours and symbols have the same meaning as in Figure 1.

Figure 4 

Multi-dimensional scaling scatterplot of character 3-grams in the Morte. Each filled circle represents a sample of 5,000 character 3-grams (three consecutive characters within a word). The two first dimensions of the MDS are shown, the colours and symbols have the same meaning as in Figure 1.

§24  To offer a better insight into which high-frequency words are distinctive for the different “books” of the Morte, we have used the results obtained with “dictionary lemmatization” as the basis for an MDS bi-plot showing the loadings of the 57 lemmas that have contributed most strongly to the first two dimensions of the MDS (Figure 5). The results suggest a greater use of romance words related to formal battle (such as “counceylle” and “bataille”) in the Roman War section, which is located on the left half of this MDS. This section also shows an increased use of third-person pronouns “his”, “them” and “they”. By contrast, the other “books” based on a major Middle English source, which are located in the right-hand upper quadrant of the MDS, show a greater use of personal pronouns denoting direct speech, such as “I”, “me”, “my” and “yow”, as well as a greater use of “allas”, which also suggests direct speech. At the same time, those “books” based on an Old French major source text, located in the lower half of the MDS, show a greater usage of words related to chivalry and questing, such as “sheld”, “hors”, “rode”, “spere” and “castel”, and greater use of the numbers “one”, “two”, “thre” and “four” (see Table 4), which may be related to descriptive details, than those “books” based on major Middle English sources.

Figure 5 

MDS bi-plot showing the loadings as arrows. Only variables with an arrow length of at least 0.6 in the unit circle were included. Labels show the most frequent word of a group of words that are combined into one variable (lemma). This MDS was calculated from the dictionary lemmatized corpus and the distance function was Burrows’ Delta.

Table 4

Loadings in the second MDS dimension of personal pronouns, numerals and synonymous words. A negative value corresponds to an arrow component pointing south in Fig. 5, and a positive value corresponds to an arrow component pointing north.

Pronoun MDS2 loading Pronoun MDS2 loading Synonym MDS2 loading Numeral MDS2 loading

hym –0.70 they –0.14 knowe –0.38 two –0.62
he –0.62 yow 0.26 countrey –0.27 four –0.53
thou –0.49 me 0.28 land 0.49 one –0.52
his –0.32 I 0.32 wete 0.66 thre –0.51
thy –0.27 my 0.60

§25  Finally, we have included a table of the loadings of the second MDS dimension in particular, where the differences between the “books” based on Old French sources in the lower half of the bi-plot and the “books” based on Middle English sources in the upper half of the bi-plot are particularly visible (Table 4). This table shows that the “books” based on Middle English sources make a comparatively greater use of pronouns associated with direct speech (such as “my”, “me” and “I”), while the “books” based on Old French sources make a greater use of “hym”, “he”, “his” and “they”. Interestingly, where the latter “books” make frequent use of “thou” and “thy”, the former “books” make frequent use of “yow” and “your”. To give two examples of synonym usage, the “books” based on Middle English source texts make frequent use of the words “wete” and “land”, while the “books” based on Old French source texts make frequent use of the words “knowe” and “countrey”.

4. Discussion

§26  We will now relate our findings to the four questions raised in our introduction.

1. What type of written source (if any) did Malory use for his Book IV, “The Tale of Sir Gareth”?

§27  Failing the discovery of a lost romance text containing a clear source or analogue story to the “Tale of Sir Gareth”, this question cannot be fully resolved and must remain to some degree a point of speculation. However, it is nonetheless possible to draw conclusions about the nature of the source of Gareth, based on how this section of Malory’s work compares to the other sections of his work, for which his sources are known. If we accept the premise discussed above, that the linguistic differences between the “books” of Malory’s work are the result of his use of different sources for different sections of his Morte, then we can also infer that those of his books with a greater linguistic resemblance to each other are based on sources that linguistically resemble each other. As we have seen, this is confirmed in our MDS graphs, which show the samples taken from sections that have French sources (Books I, III, V, and VI, marked in orange, red, yellow and pink respectively) clustering together in the lower right-hand quadrant of the graph in Figure 1. The samples taken from the “Tale of Sir Gareth” (Book IV, marked in grey) emerge as part of this cluster, an affinity that is further confirmed by the MDS generated for two-word tokens (Figure 3), in which the high-usage vocabulary of the “Tale of Sir Gareth” is represented as most resembling that of the tales of “Sir Tristram” and “Sir Lancelot” both tales based primarily on Old French source texts. Judging solely by the usage of high-frequency words analyzed here, our results therefore show that “The Tale of Sir Gareth” is linguistically significantly more similar to those “books” of the Morte that have a French written source. We can therefore conclude that Malory is most likely to have been working with a now-lost written source in French when writing his “Tale of Sir Gareth”. This supports Norris’ hypothesis of a lost French source (Norris 2008, 159) and is, from a purely pragmatic point of view, all the more convincing in that it suggests that Malory’s approach to writing this tale was consistent with his approach to writing the other seven sections of his work, in that he worked with a written source and that, as in five of the seven other sections of his work, that source was French.

2. How unified are the different sections of the Morte?

§28  Since literary unity remains a shifting goalpost that must always be interpreted in relation to other comparable collections, this is a question that cannot be resolved by our digital analysis. However, our analysis does show that the different “books” of the Morte are linguistically distinct from each other. Given that the different “books” are adaptations of different major source texts, the fact that they are linguistically heterogenous may suggest that the different linguistic features of the various source texts have not been fully homogenized into a unified text corpus, but instead remain traceable in the linguistic differences between the different “books”. The fact that the group of “books” based on Middle English source texts are more similar to each other than the group of “books” based on Old French source texts confirms that suggestion to some degree.

3. Was Malory as a reviser responsible for the differences between the Caxton and Winchester versions of his Morte?

§29  One of the weaknesses of our study is that in its current form it only draws on the Caxton version of the Morte. This is an issue that we plan to remedy in a follow-up study that will make a comparative analysis of high-frequency words and two-word phrases in both the Winchester and the Caxton version of the Morte. Until this comparative study has been undertaken, our conclusions regarding the “Roman War” section of the Morte must remain partial and preliminary. Nonetheless, our present results provide strong confirmation of the discrepancies between the “Roman War” section contained in the Caxton edition and the remaining sections of the Morte contained in the Caxton. Thus, as the first dimension of the MDS graphs illustrate most clearly, the “Roman War” samples analyzed by our MDS of high frequency words stand clearly apart from the samples drawn from all the other sections, by clustering in the extreme left hand side of the graph. Nor does our analysis show the “Roman War” section overlapping with the two other tales based on major English language sources. Thus the samples drawn from Books VII and VIII, which are based on the Stanzaic Morte Arthur and, to a lesser degree, the Alliterative Morte Arthure, cluster together in the upper right hand section of the MDS graph, at quite some distance from the samples drawn from Book II (the “Roman War” section). Taking a simplistic view of the longstanding controversy surrounding the authorship of the Caxton edition of the “Roman War” section, the most obvious explanation for the strong linguistic discrepancy between the “Roman War” section of the Caxton Morte and the other sections would be that this section was not written by Malory alone or possibly was not written by Malory at the same time as the other sections. This becomes even more likely in light of the fact that the “Roman War” section of the Morte is also the only one of Malory’s “books” that does not exhibit marked linguistic similarities to the other sections that were based on source texts in the same language (Middle English and Old French respectively).

4. What was Malory’s role in creating the Morte, and where should he be placed on the continuum between “faithful translator” and “original author”?

§30  The process of translating, selecting, arranging and formulating a narrative based on another text is a complex one, whose nuances cannot be fully grasped by quantitative methods. In this regard, quantitative assessment of linguistic differences between the tales remains a blunt instrument for understanding the role of an adaptor. It follows that our results are likely to be most illuminating when supplemented by more in-depth qualitative analysis that goes beyond the scope of this study.

§32  Nonetheless, our results do provide a concrete basis for considering the ways in which Malory’s work may be reproducing not only thematic, but also linguistic aspects of his source texts. Thus the fact that the high-frequency words used in those sections of Malory’s work that are based on Old French sources resemble each other more closely than the high-frequency words used in those sections of the Morte that are based on Middle English sources suggests that Malory was not only using the plots and storylines of his sources, but that his language, too, has been formed by his interaction with his source texts. This implies a transitory imprint of linguistic influences from his source texts on Malory’s authorial style, casting a new and different light on his authorial process. While Malory’s style of adaptation is perhaps most notable for his condensation and abbreviation of wide-ranging material (Kennedy 1981, 28), this imprint reveals the tangible effects of Malory the reader engaging with the language of his source texts in the course of rewriting them. Indeed, such an imprint implies that the process of rewriting texts cannot be neatly divided into the faithful translation of a text on the one hand, and the “original” reworking of a storyline on the other. Rather, the adaptor’s process of “borrowing” a plotline from a source text cannot be detached from his or her active engagement with the language of the source text in question and the subsequent momentary or lasting impression that engagement might leave on the adaptor’s authorial voice and identity.

§33  In light of this, the term “adaptor” seems particularly apt for Malory, in that it emphasizes the reshaping of material for a new purpose or context but gives due weight to the extent to which the “material itself” in both its abstract narrative content and the language in which that content is expressed often remains recognizably the same. It follows that in the case of Malory’s adaptation of his sources we cannot speak of “old wine in new skins”, as such images of adaptation create a dichotomy between the plot as the “substance” of a source text and the language or formal structure of the source text as the “exchangeable receptacle” in which that substance is contained. Instead, our study suggests that the language and content of his sources work together to shape Malory’s Arthuriad.

Additional Files

The additional files for this article can be found as follows:

Supplementary Figure S1

The first two MDS dimensions of three different types of tokenization (“simple stemming”, “Porter-like stemming” and “dictionary lemmatization”) that were combined with three different types of distance functions. Colours and symbols have the same meaning as in Figure 1. DOI: https://doi.org/10.16995/dm.86.s1

Supplementary Figure S2

The first two MDS dimensions of three different types of tokenization (character n-grams of length 3, 4, and 5) that were combined with three different types of distance functions. Colours and symbols have the same meaning as in Figure 1. DOI: https://doi.org/10.16995/dm.86.s2

Supplementary Figure S3

The first eight dimensions of a multi-dimensional scaling of one-word tokens in the Morte (obtained by “simple stemming” and Burrows’ Delta distance). The MDS is the same as shown in Figure 1, only here every dimension, from the first to the eighth, is shown as an independent row. The solid lines that are overlaid onto the circles representing the samples are drawn at the average value of each “book” in that dimension. DOI: https://doi.org/10.16995/dm.86.s3