What it is and what it does
§ 1 Textual scholars that deal with variant texts, as it is often the case for medievalists, often present their work using a synoptic view of the studied texts. These works are, then, commonly published online as HTML pages. In some cases, these representations can get quite big, making it cumbersome to navigate them. Within the confines of a web browsing session it is also hard to provide the user with an overview of such texts, of the lacunas and, in general, of the relations between parts of different manuscripts. The aim of CATview is to provide scholars with a simple but powerful tool that displays an interactive summary of the texts.
§ 2 CATview (Colored & Aligned Texts view) offers an interactive visualization pane to ease the navigation of HTML pages that contain texts aligned among multiple witnesses. Slightly ironically, CATview provides a so-called bird's-eye view of the manuscripts and their relations. What is shown in the CATview widget are not details of the manuscripts, but only the alignment between the manuscripts: where the manuscripts match with each other and how much.
§ 3 CATview focuses on a single functionality: providing an interactive overview of the alignment, depicted in Figure 1. CATview is meant to be used in the following way: first the scholar creates an HTML page containing the texts they want to show, then they attach the CATview widget to the that page, enhancing the usability of the displayed texts. The widget is usually placed horizontally, before the texts (as shown in Figure 2) or vertically, alongside the texts (as shown in Figure 3). CATview relieves scholars of the job of developing their own navigational widgets, a mundane yet important task that has little relation to scholarly work.
§ 4 Practically speaking, the scholars has complete freedom in choosing how to encode the texts, how to divide them in segments, how to align them, whether to use alignment tools such as Juxta (Juxta 2014) or CollateX (Dekker et al. 2014), how to turn the results into HTML, etc. All the scholars have to do, is to convert the alignment data, once it has been produced, into the format understood by CATview (delineated in the Section Technical aspects below) and to add the CATview JavaScript files to their HTML pages. In return, CATview will provide an attractive and functional navigational widget for their texts.
§ 5 Focusing on a very precise task (i.e. the visualization of results, not their creation or manipulation) allows CATview to be small, easy to understand, and applicable in many different projects.
Use, visualization and interaction
§ 6 The main functionality of CATview is to show a broad view of the alignment between many different text witnesses. This implies showing which parts are missing from certain witnesses, but also showing how much the aligned parts are actually similar.
§ 7 CATview accomplishes its task with elegance. A series of boxes is used to represents the witnesses: each box represents one segment in one particular witness. These boxes are then displayed as a grid in which each row represents a witness and each column represents an aligned passage. Passages that are not present in a certain witness are shown as gaps between the boxes. This visualization allows the scholar to get a quick overview of the similarities among a group of witnesses.
§ 8 Another important functionality of CATview is its scroll on click feature. With the proper configuration, CATview can not only show the alignment, but also offer a modicum of interaction. When a user clicks on a box, the page will be scrolled to the point where the corresponding segment occurs. At that point it is easy to enable highlighting of the clicked segment, for example by adding a red outline around the segment including a CSS rule like the following in the website stylesheet:
.segment:target { outline: medium solid red; }
§ 9 The CATview widget does not impose a particular orientation: it can be placed either horizontally or vertically, allowing its use in web pages that are supposed to be scrolled in any direction.
§ 10 Another built-in feature of CATview that can be easily be enabled with a little configuration is the so-called level of detail, i.e. the ability to zoom in and out the document map. This feature is necessary if CATview is to be used with long documents. CATview occupies the complete width of the page (when placed horizontally across the page); in documents with many segments this results in thin, unreadable boxes. Using the adjustable level of details, it is possible to zoom in to an acceptable level. By supporting variable levels of detail, CATview is able to show both an overview of the alignment as well as all the important details.
§ 11 Another useful feature of CATview is its support for showing search results. In the default configuration, search results are shown as highlighted boxes at all the level of details, as shown in Figure 4. This makes it easy to see which manuscripts and in which sections the match to the searched terms has occurred, especially when combined with the scroll on click feature. As with the other functionalities discussed in this review, CATview only provides a mechanism to show the search results, it does not implement any kind of search itself. This means that an appropriate linkage between CATview and the project's search program (e.g., an ElasticSearch server) must be established as a condition of implementing this functionality.
§ 12 Another important aspect of the CATview visualization is that the color of the boxes can be used to convey additional pieces of information, in addition to search results. For example, different colors can be used to show the degree of similarity between various versions of the same passage across manuscripts, using darker colors for very similar passages and lighter shades for passages with few similarities, as shown in Figure 5. It must be noted that CATview only provides the mechanisms to communicate additional information via box colors. Deciding how this mechanism is used in practice is for the scholar to decide, depending on his or her specific needs and research questions.
§ 13 Given all these points, it is accurate to say that CATview does a good job at its intended task: providing an interactive overview of the alignment between many witnesses.
Technical aspects
§ 14 CATview is distributed as a single JavaScript file. In addition to the main file, one needs two other JavaScript components: a copy of the D3 library and a setup file.
§ 15 The D3 library is what CATview uses internally to draw shapes on the screen. D3 is a well known free and open source library used by thousands of other visualization projects.
§ 16 The setup file is used to connect CATview to a specific set of witnesses. There is no standard setup file because every publication is different. Internally the setup file is a JavaScript file, in it the various components are instantiated, set up and put in relation with the rest of the page elements. This is done though the commands made available by the CATview API. A guide on how to use these commands is provided as part of the CATview technical documentation.
§ 17 The most important part of the setup file is that which establishes edges, that is, the records used to describe the alignment. More precisely, the edges are numerical values that indicate whether two fragments are aligned and also how similar they are. The value 1 is used to denote that the text of a certain segment in a particular witness is identical to the corresponding segment in the reference text, 0 denotes a complete difference, and -1 indicates that a certain segment is missing from that particular witness. Intermediate values such as 0.2 or 0.8 are used to indicate the degree of similarity between a section in a particular witness and in the reference text.
§ 18 The meaning associated with these values is decided by the scholar that sets up CATview. For example, if there is a reference text, these values could be calculated based on the similarity between that text and the particular witness. But this is just one of the possibilities. The fact that CATview does not impose a particular meaning to these values allows scholars to be innovative in their use. For example, they could use 1 to denote what they deem as the most correct variant (and not the most correct witness) and use the decimal numbers to denote similarity to this variant. As the other features of CATview, what is offered here is just a mechanism: its configuration and associated meaning to be conveyed though it are left to its users.
§ 19 Let us see a practical example. Suppose we have four witnesses: W1, W2, W3 and W4. These witnesses are divided in 5 different segments. The following code shows the edges associated to a hypothetical alignment.
① edges = [
② // W1, W2, W3, W4
③ [ 0, 1, 1, 0], // edges for segment 1
④ [ 1, 1, 1, 1], // " for segment 2
⑤ [ 0, 0, 0, 0], // " for segment 3
⑥ [ 0, 1, 1, -1], // " for segment 4
⑦ [ 1, 1, 0.3, 1] // " for segment 5
⑧ ]
§ 20 In line 3 we say that the first segment is identical in W2 and W3, while W1 and W3 contain different texts. In line 4 we state that all the witnesses contain the same text in their second segment, while in line 5 we say that all the witnesses contain different texts. Line 6 shows a more complex situation: the text in W1 is not similar to any of the others, the text of W2 and W3 are identical and W4 completely lacks the fourth segment. The last line states that W1, W2 and W4 contain the same text in segment 5, and that W3 contains a text that is only 30% similar to that found in the other witnesses. As already said, the edge data must come from an external source, as CATview does not provide an alignment analysis tool.
§ 21 Going back to the setup file, it is also the place where the behaviour of CATview is programmed and linked to the actual document. What happens when the user clicks on the box that represents a certain section of a certain witness? Most people would expect the text to move to that witness, and the selected section to be highlighted. But moving the text, or highlighting a section of it, is a very document-specific operation that depends on the way the sections have been created and on how the text has been encoded in HTML. For this reason, CATview provides only a generic infrastructure and expects its users to write the glue code in the setup file.
§ 22 The installation of CATview is quite easy, it just consists in copying all the JavaScript files to the web server. Although the installation instructions do not suggest it, it possible to embed CATview in an existing HTML page without changes to its structures. This means that it is possible to enhance existing pages with CATview with little effort.
§ 23 As a last point, it must be noted that CATview itself is a free and open-source project, licensed under the MIT License (MIT 1998). Practically speaking this means that it is possible to integrate CATview in any project without worrying about its legal status, as long as the license text is left untouched and the authors are properly credited.
Remarks, limitations and suggestions for use
§ 24 In summary, it can be said without doubt that CATview achieves its intended goal of enhancing the display of synoptic views of manuscripts.
§ 25 There are, however, few limitations that must be taken into account by those who would like to integrate it in their projects.
§ 26 First of all, it must be clear, and this point has been reinforced many times in this review, that CATview provides only a set of mechanisms, not a full solution to the problem of showing alignments. For a start, the alignment data that CATview requires must be computed before it can do its work. Similarly, while CATview provides a way to show search matches, the implementation of the search itself (a non-trivial task) is left to the user. Another thing to take into account is the fact that the text of the manuscripts must already be in the HTML format. The reviewer suggests the prospective users of CATview set up an automated workflow (or at least a script) that takes, for example, the source TEI/XML files with the marked up text of witnesses together with the alignment data and automatically generates the HTML file, the setup JavaScript file and ingest the data into a search engine like ElasticSearch or BaseX.
§ 27 Another key limitation is the fact that the visualized grid structure fits only alignments that are very regular and can be expressed in the form of a table. This has two main consequences. First, it is hard to show transpositions in CATview. With custom code it would be possible to draw over the widget to connect the boxes that identify the transposed segments. This functionality is, however not included in the current version of CATview. Second, given that all the boxes have the same shape and size, they should be used to represent segments of approximately the same lengths. Should the passages be very different in length, the visualization offered by CATview would be misleading.
§ 28 A further weak point of CATview is the fact that all the manuscripts must be shown at the same time and there is no way, short of writing a sizable amount of code, to show only a subset of the witnesses. In the experience of the reviewer, CATview becomes confusing to use when more than five witnesses are examined at the same time. This could be a severe limitation for projects that deal with more than six witnesses. On the other hand, CATview is supposed to be used a part of synoptic displays of witnesses, and classic column-based techniques for synoptic display are known to be inadequate when used with more than a handful of witnesses (Zappavigna 2011).
§ 29 To conclude, CATview provides an easy-to-implement way to add a useful navigation widget for projects with few witnesses and very regular alignments.
Works cited
Juxta. 2014. Available online at http://www.juxtasoftware.org, last accessed 2016-04-05.
Dekker, R. H., Dirk van Hulle, Gregor Middell, Vincent Neyt, and Joris van Zundert. 2014. Computer-supported collation of modern manuscripts: CollateX and the Beckett Digital Manuscript Project. Literary and Linguistic Computing 30(3). doi:10.1093/llc/fqu007
Pöckelmann, Marcus, André Medek, Paul Molitor, and Jörg Ritter. 2015. CATview - Supporting The Investigation Of Text Genesis Of Large Manuscripts By An Overall Interactive Visualization Tool. Digital Humanities, DH2015, Sydney, Australia.
MIT. 1998. The MIT License. Available online at https://opensource.org/licenses/MIT, last accessed 2016-04-05.
Zappavigna, Michele. 2011. Visualizing logogenesis: preserving the dynamics of meaning. In Semiotic Margins: reclaiming meaning, edited by Shoshana Dreyfus, Susan Hood, Maree Stenglin.