Introduction
§ 1 Compliance with open standards in software and multimedia projects is an excellent thing for the projects' users, and so it is often promoted as a virtue that programmers and digital content creators should strive for. Developers have not always shared this concern, unfortunately, with the result that the users who expected to make use of some electronic resource in their research are occasionally prevented from finding all the answers they had sought. What is the end user to do in such a situation, besides write to the publisher and ask that the needed feature be added in the next version? Sometimes the user can do more, perhaps much more, depending on the program or project. In the case of one invaluable database, the Illustrated Incunable Short Title Catalog on CD-ROM (IISTC) (British Library 1998), there is a wealth of useful information trapped behind an inadequate user interface. Medievalists working on fifteenth-century literature or early printing have two options: they can practice the patience of a recusant or they can seek a radical solution, namely, exporting all 28,360 records, extracting the necessary information, and importing it into the fields of a standards-compliant database. The questions about early printing that can then be more easily answered illustrate one reason why standards compliance is so important for humanities computing projects: designers and developers can never anticipate all the research inquiries that scholars may wish to pursue.
Problems
§ 2 As a catalog of all known incunable
editions with an extensive if not yet complete list of known
copies, the IISTC comes closer than any other presently available
reference work to being a worldwide incunable census. It is
therefore an essential tool for research libraries and scholars
in many fields. As a computer database rather than a printed
catalog, the IISTC promises quick answers to scholars' questions.
In a thorough review article, Paul Needham praises the IISTC as a
milestone in the history of incunable bibliography but also
identifies numerous deficiencies in the database and particularly
in the idiosyncratic user interface (Needham 1999). The application of
computer technology to bibliography makes the IISTC a
revolutionary publication in incunable study
, by
allowing searches that from printed reference works...can
be made only laboriously, or for practical purposes cannot be
made at all
(479). And yet, because of the limitations
of the software provided for searching the IISTC database,
several types of inquiry remain laborious, impractical, or at
times impossible. Needham specifically mentions the following
shortcomings:
- One can save only entire records, rather than exporting only particular fields (478).
- The interface software is unstable, particularly when conducting complex searches (483).
- Much useful information is found not in the incunable records, but in various unexportable lists, for which there is no way to easily view their data. Similarly, there is no way with the current software of the IISTC to select a city, and then view or capture a list of all the recorded printing shops of the city (486 n. 58).
- Editions with unsigned dates have been assigned a Year of Publication through an opaque and problematic process, so that the field is inadequate for even the roughest of statistical analysis (489). Editions assigned to a range of years, such as 1477-79, will only turn up in a search for the first or last year in the range, but not under 1478 (520).
- Libraries in some countries are recorded inconsistently, so that it is often difficult or impossible to compile, with a single search, all the incunables of a given library (498).
- The asterisk denoting that Hain had personally inspected a particular imprint is indistinguishable from the asterisk-as-a-wildcard character in bibliography searches; consequently, the IISTC's software silently mishandles Hain's asterisks (499).
- The drop-down lists in the Bibliography field are in strict alphabetical order, making it impossible to search catalog numbers in numerical sequence. This is a drawback for works such as Kurt Ohly and Vera Sack's catalog of Frankfurt incunables, and a substantial problem for searching the older standard Hain-Copinger-Reichling series (498-99).
- The Notes field can be searched only through All Fields searches (501).
- There is no easy way to search for only signed or unsigned cities, printers, or dates. That is, the user interface search capabilities cannot reliably differentiate between editions that bear the name of a Basel printer and editions that scholars have attributed to a Basel printer (520).
- Many of these problems are related to the search software's failure to implement true string searches, instead treating William Caxton as William AND Caxton with unpredictable results (520).
§ 3 In addition, although the
still-incomplete IISTC is unmatched as a worldwide incunable
census by any other resource including the
Gesamtkatalog der Wiedgendrucke
, the IISTC lists the present-day locations of incunable
editions, but does not display the total number of copies. This
is not a serious problem if one can see at a glance that there
are only one or two copies of a given incunable, but rather
irksome if there are dozens, and a huge handicap if one wishes to
compare the number of recorded copies for more than a few
editions. To answer the question, For which incunable does
the IISTC record the largest census?
it is best to know
the answer beforehand: Anton Koberger's edition of the Latin
Nuremberg Chronicle, 12 July 1493
(Goff S-307), and even then one must calculate the total number
of copies by hand (Needham arrives at more than
780
Needham
1999, 497]). Even an experienced scholar of
fifteenth-century printing might have difficulty naming the
second-, third-, or tenth-largest incunable census, or the
largest census for works printed in German or another of the
vernacular languages. This information lies within the database,
but the IISTC interface prevents users from accessing it.
A solution
§ 4 Repetitive tasks, such as adding up the number of copies for 28,360 editions, are best left to computers, and therein lies the solution, which has but four essential steps:
- Export all 28,360 records in the IISTC as a plain text file;
- Transform the text file into a standard file format;
- Import the information into a database or spreadsheet application;
- Search the new database using standardized and well-documented search tools.
While many different approaches and various software packages could be used to implement these steps, the following discussion is based on readily-available software and consumer applications that are practically standard issue in the computing infrastructure of many colleges and universities. (For a full account of the process of importing the IISTC records into a database, see appendix 1.)
§ 5 Because the IISTC allows any number of records to be selected and exported, a user could choose to export all 28,360 records to a plain-text file, at least in theory. The operation itself can take hours or even days, perhaps as a legacy of the IISTC's providing only a 16-bit Windows interface coded in Visual Basic 3. The minimal requirements of the IISTC software mean that it can run on quite antiquated hardware, although at the cost of increased liability to crash and increasingly uncertain interoperability with a library's newer computer infrastructure. Exporting all the records is nevertheless possible and, as the following will show, quite useful.
§ 6 The export of every IISTC record results in a very long list of records such as the following:
The Illustrated ISTC (2nd Edition)
Author: Aesopus
Title: Vita, after Rinucius, et Fabulae, Lib. I-IV, prose
version of Romulus [German]. Add: Fabulae extravagantes.
Fabulae novae (Tr: Rinucius). Fabulae Aviani. Fabulae collectae
[German] (Tr: Heinrich Steinhöwel). Leonardus Brunus Aretinus:
De duobus amantibus Guiscardo et Sigismunda [German] (Tr:
Nikolaus von Wyle)
Imprint: [Augsburg: Anton Sorg, about 1479]
Language: German Format: f°
Notes: General+Production:
Woodcuts
Cataloguing Source: Goff A120
Bibliography: HR, Supplement 333; Schreiber, Manuel 3028a;
Schramm IV p. 50; GW 353
Locations:
British Isles: London, Victoria and Albert Museum
USA: LC(R); MMu(P)L
Germany: Dresden KupferstichKab
ISTC No: ia00120000
(c) British Library Board and (c) Primary Source Media
One could cut and paste all of the information by hand from this record to a database or spreadsheet table, where Author was one column, Title another, and so on. It would be arduous and repetitious, and therefore best left to a computer. Fortunately, there are well-documented and accessible scripting languages such as Perl exactly suited for this task. (For Perl software and documentation, see http://www.cpan.org; http://www.perl.org; http://www.activestate.com.) One must only tell the computer:
Read through all 495,923 lines in the exported text file; whenever you find a line that begins with Title:, save everything between the non-printing tab character and the end of the line; now look for a line that begins with ISTC No:, and do the same. Finally, print the ISTC number as an index, then a tab character to separate the fields, then the title, and then a new line character. And then get back to work!
The script, in Perl as written by a medievalist (and explained more fully in appendix 2), might look something like this:
$batch="istc.txt";
open BATCH, $batch or die "Cannot open $batch for
read:$!";
while (<BATCH>) {
if (/^Title:\t(.*?)$/) {
$match = $1;
$hit=1;
}
if (/^ISTC.*(i.\d{8})/ and ($hit == 1)) {
$hit = 0;
$istc_number = $1;
print "$istc_number\t$match\n";
}
}
That is, if the entirety of the IISTC is exported as plain text to the file istc.txt and the Perl script invoked as written, it will produce a very long list that begins in the following way:
ia00000500 Orhot Hayyim
ia00001000 Abbey of the Holy Ghost
ia00001500 Abbey of the Holy Ghost
ia00002000 Abbey of the Holy Ghost
ia00003000 Abbreviamentum statutorum
ia00004000 Abbreviamentum statutorum
ia00004500 Abbreviamentum statutorum
ia00005000 Abbreviamentum statutorum
ia00005500 Abecedarium
ia00008000 Dialogus in astrologiae defensionem cum vaticinio
a diluvio ad annos 1702. With additions by Domicus Palladius
Soranus
ia00009000 Trutina rerum coelestium et terrestrium. With
additions by Augustinus Beganus and Ludovicus Ponticus
ia00009100 De luminaribus et diebus criticis
If the output is redirected to a file, then one is left at the end with a tab-delimited table containing a list of ISTC index numbers and their corresponding titles, which can be imported into the database application of one's choice. With similar scripts that search not for Title: but for Author: or Imprint:, for example, the rest of the information can be extracted as well and then imported in turn. While the IISTC search interface is idiosyncratic, inadequately documented, and crash-prone, the database industry has spent decades and billions of dollars on standardizing, documenting, and crash-proofing their software.
§ 7 While using another software
package to replace the IISTC interface is useful, any spreadsheet
or database application will have its own limitations on what it
can do with the records in their present form. Opening up the
IISTC has the added advantage, however, that the records can be
manipulated further. For example, the Imprint field
could be split up into city, printer,
and date fields; or a flag could be added to mark
each as signed
or
unsigned
; or fields could be created for
the first, last, or average of all dates attributed to unsigned
imprints. The pattern matching and string manipulation
capabilities of Perl are quite robust and can even be made to
deal with defective IISTC records. (For one possible
implementation of a script to analyze the Imprint
field, see appendix 3.) The same kind of
manipulation can be done on the Locations field to
provide a count of the number of copies identified for each of
ten geographic regions, which can in turn be added to yield an
overall sum. A discussion of the particular challenges here and
sample scripts are provided in appendix
4.
Results
§ 8 Is the effort worth it? While learning enough Perl to write the necessary scripts takes some time, it is much more manageable than learning, say, Latin. Whether that is time well spent depends on one's needs, and how much one prefers to let a computer handle repetitive search and tabulation. As noted above, Needham regrets that there is no way to quickly view a list of recorded print shops for a given city using the IISTC (Needham 1999, 497 n. 58), even though the IISTC holds this information. With a database application as an interface, however, one can quickly extract the required information. Thus we can discover that the IISTC records the following printers for Ulm:
Conrad Dinckmut
Conrad Dinckmut?
Hans Hauser
Johann Reger
Johann Reger, for Justus de Albano
Johann Schäffler
Johann Zainer
Johann Zainer, not before 1478
Johann Zainer?
Lienhart Holle
This example, like the others here, was created using Microsoft
Access. This software is neither a model of standards compliance
nor inexpensive; it is, however, the most widespread of desktop
database applications. With its query by
design
functionality, one can graphically select the
printers and cities database fields,
specify that the latter should correspond to Ulm,
and let Access automate the process of generating the correct
query statement; the process requires less than a minute to set
in motion and just seconds to execute. One can just as easily
formulate a more exact question, for example, For what Ulm
printers do we have incunable editions with signed city,
printer, and year? How many are there? When were they
printed?
By pencil-and-paper methods, the following
table would take quite some time to construct, but with a
database application just a few minutes or, with experience,
seconds:
Printer | Signed editions | First signed year | Last signed year |
Conrad Dinckmut | 31 | 1482 | 1496 |
Johann Reger | 10 | 1486 | 1499 |
Johann Reger, for Justus de Albano | 1 | 1486 | 1486 |
Johann Schäffler | 8 | 1492 | 1499 |
Johann Zainer | 35 | 1473 | 1500 |
Lienhart Holle | 6 | 1482 | 1484 |
As the IISTC records relatively few imprints after 1501, the last signed year does not, of course, indicate that a printer ceased operation around that time. The SQL statement used for the search may seem complicated at first glance, but one does not have to glance at it even a first time, thanks to the query design system of Microsoft Access and other consumer database applications:
SELECT istc.Printer, istc.City, Min(istc.first_year) AS
MinOffirst_year, Max(istc.last_year) AS MaxOflast_year,
Count(istc.istc_number) AS CountOfistc_number
FROM istc
WHERE (((istc.Flags) Like "+++"))
GROUP BY istc.Printer, istc.City
HAVING (((istc.City)="ulm"));
(Note that the database name is istc, and the relevant fields are Printer, City, first_year, last_year, istc_number, and Flags, which signify whether the city, printer, and date are signed or attributed.)
§ 9 The preceding table should not be confused with an authoritative statement based on extensive research. It is rather a quickly-constructed summary that provides a first impression of the overall situation of early printing in Ulm, but that is by itself a useful function for a computer database.
§ 10 What if one wanted to see a rough overview of the development in number of editions printed each year? (See, for example, Neddermeyer 1998, 2:609-10.) As noted above, the IISTC Year of Publication field is entirely inadequate for this, and the IISTC does not permit searching of editions with signed dates only. After the IISTC records have been imported into a database, one possibility would be to take the average of dates that appear as [1479-81], or one might choose instead to consider only the 12,072 imprints with a signed date. If one takes the latter option, one can quickly paste the resulting data into Microsoft Excel—another omnipresent if not inherently standards-friendly spreadsheet application—to construct a graph such as the following:
Needham notes that a search of IISTC's Year of Publication field would find a seeming contraction in book printing between 1477 and 1479, but that this reflects idiosyncrasies in the IISTC search software rather than an actual shrinkage in production (Needham 1999, 489). The summary of incunable production for the years 1475-1485 (below) finds that this apparent contraction was indeed spurious-but perhaps not that for 1482 through 1484, when signed editions decline by 18% over two years (the only decline lasting more than a single year). Additional work is required to determine how widespread this phenomenon was or what its causes might have been (see also Neddermeyer 1998, 1:420-22), but the graph at least provides the right place to start, where the numbers provided by the IISTC search interface do not.
Year | Editions (IISTC) | Editions (signed only) |
1475 | 835 | 242 |
1476 | 589 | 231 |
1477 | 672 | 257 |
1478 | 657 | 266 |
1479 | 563 | 245 |
1480 | 1177 | 285 |
1481 | 734 | 342 |
1482 | 816 | 359 |
1483 | 932 | 334 |
1484 | 717 | 295 |
1485 | 1118 | 312 |
§ 11 And what of the Nuremberg Chronicle? It stands at the head of the list of most-preserved incunables, but what follows it? According to the IISTC, the Nuremberg Chronicle vastly outnumbers its closest competitor:
Author | Abbreviated title | Reference | Imprint | Copies |
Schedel, Hartmann | Liber chronicarum | HC 14508* | Nuremberg: Anton Koberger, 12 July 1493 | 786 |
Aristoteles | Opera [Greek]... | HC 1657* | Venice: Aldus Manutius, Romanus, 1495-98 | 319 |
Biblia latina... | HC 3173* | [Strassburg: Adolf Rusch, for Anton Koberger at Nuremberg, not after 1480] | 287 | |
Politianus, Angelus | Opera... | HC 13218* | Venice: Aldus Manutius, Romanus, July 1498 | 270 |
Euclides | Elementa geometriae... | HC 6693* | Venice: Erhard Ratdolt, 25 May 1482 | 266 |
Epistolae diversorum philosophorum...[Greek] | HC 6659* | Venice: Aldus Manutius, Romanus, 1499 | 266 | |
Firmicus Maternus, Julius | Mathesis (De nativitatibus libri VIII)... | HC 14559* | Venice: Aldus Manutius, Romanus, June and [17] Oct. 1499 | 257 |
Ubertinus de Casali | Arbor vitae crucifixae Jesu Christi | HC 4551* | Venice: Andreas de Bonetis, 12 Mar. 1485 | 252 |
Boethius | Opera | H 3351* | Venice: Johannes and Gregorius de Gregoriis, de Forlivio, 1491-92 | 251 |
Antoninus Florentinus | Summa theologica (Partes I-IV)... | HC 1243* | Venice: Nicolaus Jenson, 1477-80 | 242 |
These numbers should not be understood as the number of copies
now existing, nor even as the number of copies recorded by the
IISTC. Rather, one has to interpret them as one computer script's
interpretation of the IISTC data, which is itself incomplete and
sometimes ambiguous. Martin Davies, ISTC general editor, has
stated, however, that ISTC data collected as of 1992 would
proportionally reflect the total number of surviving copies:
The numbers of copies of any particular edition...must,
however, bear a fairly constant relation to the total now
extant: the fewer recorded the scarcer an edition will prove to
be
(Davies and Goldfinch 1992, 20). If
exact precision is essential, then one would do well to verify
all figures by hand. By my calculation, the IISTC records over
350,000 individual copies for fifteenth-century editions, or
between 65% and 80% of all surviving incunables by various
estimates (see Neddermeyer 1998, 1:79). Based on
these provisional figures, one would expect a complete census of
surviving copies of the Latin Nuremberg
Chronicle to have somewhere between 1000 and 1250
copies. Christoph Reske arrived at ca. 900 copies with another
estimated 135 in private hands (Reske 2000, CD 275-77), while Paul
Needham's ongoing count, largely restricted to copies in public
libraries, already approaches 1200 (personal correspondence,
cited by permission).
Conclusion
§ 12 There are countless ways to graph, chart, and tabulate the IISTC data, but those that occur to this author may not be the same ones that would hold the interest of the present reader. The previous examples should be enough to demonstrate the utility of allowing other software applications to cooperate with the IISTC's data. Standard database searches will address most of the shortcomings of the IISTC identified by Needham (for example, escaping the asterisk character so that it is not interpreted as a wildcard in searches). The rest can be addressed to the extent the underlying data allow by some additional script writing. If the effort required is justified, the tools at one's disposal are flexible enough to provide an answer.
§ 13 The preceding discussion may hold broader implications for designers and users of other electronic reference works. The general outlines of the solution offered here may be applicable to other electronic resources: exporting records, manipulating them, and re-importing them into another application is by no means a unique process. Even if the software in question has no export function, more sophisticated programming can always automate manual copying and pasting.
§ 14 An important point of application design is that a tabular view of a database often permits important phenomena to be more easily visualized and defective records to be more easily found. Providing unimpeded access to the data offers maximum flexibility and value for an application's users. Database fields that cannot be directly viewed and whose reliability cannot be easily verified, such as the IISTC's Year of Publication field, are necessarily less useful than they otherwise could be.
§ 15 While a search interface can have many uses, it cannot anticipate every question that might be asked, and so it can aid or supplement but never entirely replace access to the underlying data. Much of the effort required to make the IISTC data accessible to other software applications would not have been necessary if the IISTC had maintained consistent formatting and made use of an open format from the beginning. That it did not, however, does not mean that scholars and other end users have to wait for the British Library to redesign its project. If necessary, standards can also be imposed from below.
Appendix 1: A step-by-step description of opening the IISTC database
§ 16 The following discussion assumes that the user has the IISTC, Microsoft Windows, and Microsoft Office installed on his or her computer. While the IISTC runs only under Windows, similar results should be achievable with any database software.
-
Select all records in the IISTC. First, enter a search on the Search screen that returns all 28,360 records, such as searching for i* in the ISTC Number field. On the List Display screen, click on Select All. Once this choice has been confirmed, the computer may become unusable for an hour or more until the operation has completed.
-
After all records have been selected, click on Export, which is also on the List Display screen. This may also require considerable time before the dialogue box appears. Do not change the export range, but do change the export format to plain text by clicking on the button marked Using Rich Text Format (RTF); once you click on it, the title will change to read Using Plain Text (TXT), which is the desired format. Click on the button marked Export, then select a location for the exported file and give it a name. The examples here assume a filename of istc.txt. The export process may take many hours and tie up all the resources of the computer during that time. The resulting file will be over 22 megabytes in size.
-
The various fields in the IISTC can now be turned into tab-delimited tables one at a time using Perl scripts such as that found in appendix 2 and above. If the script is named title.pl, the output can be redirected to create a file named title.txt:
perl title.pl > title.txt
Otherwise the output will appear on screen. Scripts virtually identical to that found in appendix 2 can be used to create a series of files, each a tab-delimited table containing an ISTC number in one column and one additional field in the other.
-
These tables can now be imported one at a time into a database application. Using Microsoft Access, create a new database file, then open the text files one at a time using the File > Get External Data > Import function. Specify that the file is delimited, and that tabs separate the fields, and that it should be opened in a new table. Import the first field as indexed (no duplicates) and give it an appropriate title, so that the ISTC number can serve as the index of the imported database as well; the next stage of the import process will let you choose the ISTC number as the primary key for the database.
Because some title records are longer than the 255-character limit that Access imposes on text fields, these records will be truncated and an error message will appear. Import the titles as a second table in the same way, but with the title field as a memo data type. The truncated titles can be used when sorting is necessary, while the full memo field will be available when the complete titles are needed, so both are useful.
Import the rest of the text files in the same way, choosing appropriate titles for each table and an appropriate data type for each field. Maintain consistency in the naming of fields.
-
The tables can now be joined one at a time into one large flat-file database table, which can simplify later searches. Select the title table, because it contains all 28,360 records, and, for example, the authors table, which only contains 20,933. Create a query by selecting these two tables; the identical ISTC Number fields are already automatically joined. Right click on the link between the two tables, examine the join properties, and click on the third option: we need all the records from the title table as well as the corresponding records in author. Under the Query menu, select Make-Table Query and choose a name, such as istc2. Running the query will create a new table that includes all the ISTC numbers and titles for all records as well as the author for works that have one. Repeating this process with the newly created table and the table containing the next field to be imported will eventually result in a large table with all of the fields readily accessible in the ISTC database. The table should have 28,360 records after every step.
-
Some of the most useful information in the IISTC requires further analysis of its fields using more Perl scripts. A script to analyze the imprint field is found in appendix 3, while a script to provide a copy count is found in appendix 4. Each of these Perl scripts will create new tab-delimited tables that can be imported into the database by following steps 4 and 5 above.
-
Because the IISTC's Printing Regions function assigns some incunables to more than one region, this information cannot be imported into the same table. If this information is required, the relevant records would have to be exported from the IISTC separately, the ISTC Numbers extracted, and a new table created that does not use the ISTC Number as an index. One can then limit one's searches according to the IISTC's printing regions by searching out only those records in the larger table for which an ISTC number in the regions table is associated with the desired region.
§ 17 This is by no means the only possible approach towards creating a database from the IISTC records, or even one that is particularly faithful to the ideal of standards compliance. While Microsoft Access has a large install base, it is quite expensive; open-source and standards-compliant database solutions such as MySQL exist, but none yet matches the ease of use of Access. A monolithic flat file may not be the best database for all circumstances. In addition, some questions are still best handled by recourse to further script writing, particularly if further text manipulation is required, such as numerically sorting entries in the bibliographic standard works.
Appendix 2: A Perl script for extracting fields from the istc.txt file
§ 18 The comment lines, which begin with the pound sign, explain the function of each line of code.
$batch="istc.txt";
# Define the name of file to search
open BATCH, $batch or die "Cannot open $batch for
read:$!";
# Open the file, or close with an
# error if it doesn't exist
while (<BATCH>) {
# As long as there are lines
# in the file left to search...
if (/^Title:\t(.*?)$/) {
# ...look for the pattern
# "Title:<tab character><anything
# else>
# at the beginning of a line
$match = $1;
# Save "anything else"...
$hit=1;
# ...and set a flag that
# we've found what we're
# looking for
}
if (/^ISTC.*(i.\d{8})/ and ($hit == 1)) {
# Now, if we have a match already
# saved, look for the
# pattern "ISTC" at
# the beginning of the
# line, and then anything
# else, and then "i" followed
# by eight digits; save the
# "i" and the digits, as that's
# the ISTC number
$hit = 0;
# Reset our flag
$istc_number = $1;
# Assign the "i plus eight digits"
# to a variable
print "$istc_number\t$match\n";
# Print the ISTC number, a tab
# character, the title, and a
# new line character
}
}
The output, as explained above (§ 6), begins like this:
ia00000500 Orhot Hayyim
ia00001000 Abbey of the Holy Ghost
ia00001500 Abbey of the Holy Ghost
ia00002000 Abbey of the Holy Ghost
This output should be redirected to a file to be saved for further use like this:
perl title.pl > title.txt
Very little needs to be changed in order to extract the rest of the fields. In the line if (/^Title:\t(.*?)$/) {, one need only replace Title with Author, Bibliography, Cataloguing Source, Collective Title, Format, Imprint, Language, Locations, or Notes.
Appendix 3: A Perl script for analyzing the IISTC imprint field
§ 19 On many occasions, it would be useful to turn the IISTC imprint field into separate fields for city, printer, and date of printing, and to clearly distinguish between signed and attributed information. The following script accomplishes this based on the output from the script in appendix 2 as applied to the Imprint field.
# This script takes as input a
# tab-delimited table of istc
# numbers and imprint fields,
# assumed here to be named 'imprint.txt'.
# This script outputs the istc number
# again as an index, followed by the
# first imprint field only, then fields
# containing the city and printer. Then
# it outputs the years: the average
# of all years in all imprint fields,
# the earliest and then the latest such
# year. The last column contains three
# flags, either + or -. Signed cities,
# printers, and dates appear as +++,
# while the opposite would be ---. Years
# appearing in single quotes ('1401')
# have been ignored.
# set imprint data file
$batch="imprint.txt";
# open the file to process, or give an error
# code
open BATCH, $batch or die "Cannot open $batch for read:$!";
# create column titles
print
"istc_number\timprint\tcity\tprinter\tavg_year\tfirst_year\tlast_year\tflags\n";
while (<BATCH>) {
# first, reset all variables
undef @allyears;
undef @sort;
$firstyear=0;
$lastyear=0;
$avgyear = 0;
$yearcount = 0;
$flags='+++';
# save the input line as $record for later
# use
$record=$_;
# get first two tab-delimited fields, the
# ISTC Number and first imprint line
/^(.*?)\t(.*?)\t/;
$istc_number=$1;
$imprint=$2;
# search the imprint line for an optional
# opening bracket, then the city, then a
# colon, then the rest of the line
$imprint=~/^(\[|)(.*?)(?:\]: |: )(.*)$/;
$rightpart=$3;
$city=$2;
# if an opening bracket was found, flag
#the city as unsigned
if ($1) {substr $flags, 0, 1, "-"}
# split the rest of the line by commas,
# forming the array @printer
@printer=split /, /, $rightpart;
# fix 3 defective records: if there's
# no comma found in the rest
# of the line, and there's no number
# to be found, add a dummy,
# empty date element to array
# fix defective imprint lines not
# handled correctly: ip01005630
# (no year,), ic00216715 (no year,), ir00334450
if ($#printer==0 and $printer[0]!~/\d/) {push @printer, "
"}
# fix for two defective records with no
# imprint data: print the
# istc number and then skip the rest of the loop
if ($record=~/^([^\t]*?)\t$/) {
$istc_number=$1;
print "$istc_number\n";
next;
}
# remove the last element of @printer array;
# it's usually the date field
$date = pop @printer;
# fix for two deficient records containing neither
# city nor printer, just dates
if ($imprint !~/:/) {
$date = $imprint;
undef @printer;
$city = "";
}
# remove all brackets to test for a date; we
# need to find the ca. 150 records of the
# anomalous form 'City: printer, year, month
# and day'
$_ = $date;
s/[\[\]]//g;
$xdate=$_;
# remove all brackets from current last element
# of @printer array
$ydate=@printer[-1];
$ydate=~s/[\[\]]//g;
# if $date doesn't contain a year, then check
# the last element of @printer; if it does,
# pop it onto the front of $date
if ($xdate !~/1[45]\d{2}|undated/i and $ydate=~/1[45]\d{2}/)
{
$date=pop(@printer).$date;
$_ = $date;
s/[\[\]]//g;
$xdate=$_;
}
# now obliterate dates in single quotes regarded
# as false
$xdate=~s/'.*?'//g;
# match a year 1400 to 1599
$xdate=~/(1[45]\d{2})/;
# if we find it, use it, otherwise we have nothing
# to test
if ($1) {$testyear=$1} else {$testyear=""}
# if we have a date to test, get the last two digits
if ($testyear) {$yeardigits=substr $testyear, 2, 2} else {$yeardigits='####'}
# if the last two digits are surrounded by brackets,
# flag the date as unsigned. [14]94 is treated
# as signed, 14[9]4 as unsigned
$_ = $imprint;
if(/\[[^\]]*$yeardigits[^\]]*\]|\[$yeardigits|$yeardigits\]/ or $yeardigits eq '####') {
substr $flags, 2, 1, "-";
}
# split the input line again on the tabs
@checkdates = split /\t/, $record;
# but discard the first two tabs
$null=shift @checkdates;
$null=shift @checkdates;
# and add the date field previously identified
unshift (@checkdates, $date);
# this next loop extracts all years from each
# imprint field in turn
foreach $possibledate (@checkdates) {
$_=$possibledate;
# remove brackets, get rid of '1401' dates
s/\[|\]|'.*?'//g;
# find simple years, like 1493, 1494-,
# 1498-1505
@simple_years=/(1[45]\d{2})/g;
# add the years found to the list
push (@allyears, @simple_years);
# find dates like 1476-80
$_=$possibledate;
@complex_years=/(1[45]\d{2}[\-\/]\d{2})\D/g;
# first count the simple years in the next loop
foreach $simpleyear(@simple_years) {
$avgyear+=$simpleyear;
$yearcount++;
}
# and add the second part to the list of years
# in the following loop
foreach $complexyear (@complex_years) {
# find the element to split on: either - or /
$split=substr($complexyear,4,1);
# ignoring @temp[0], as it is already a simple_year
@temp = split /$split/, $complexyear;
@temp[1]=substr(@temp[0],0,2).@temp[1];
push (@allyears, @temp[1]);
$avgyear+=@temp[1];
$yearcount++;
}
}
# round to nearest year
if ($yearcount) {
$avgyear=int(($avgyear/$yearcount)+.5);
} else {
$avgyear = "";
}
# now sort the years numerically
@sort = sort { $a <=> $b } @allyears;
$firstyear=@sort[0];
$lastyear= @sort[-1];
# put the printer back together
$printer=join ', ', @printer;
# add missing front or back brackets for aesthetics only
$_ = $printer;
if (/^[^\[]+\]/) {$printer='['.$printer}
if (/\[[^\]]+$/) {$printer=$printer.']'}
# now get rid of all brackets and store as $xprinter
$_ = $printer;
s/[\[\]]//g;
# if the printer is enclosed in brackets, or begins with a
# bracket, flag as unsigned
$xprinter=$_;
if ($imprint=~/\[[^\]]*\Q$xprinter\E[^\]]*/ or
$printer=~/^\[/) {
substr $flags, 1, 1, "-";
}
# output the information and continue on to the next
# record
print "$istc_number\t$imprint\t$city\t$xprinter\t";
print "$avgyear\t$firstyear\t$lastyear\t$flags\n";
}
That is, the input file begins like this:
ia00000500 [Spain or Portugal: Printer of Alfasi's Halakhot,
before 1492?]
ia00001000 Westminster: Wynkyn de Worde, [about 1496]
ia00001500 Westminster: Wynkyn de Worde, [about 1497]
ia00002000 Westminster: Wynkyn de Worde, [about 1500]
ia00003000 [London: John Lettou and William de Machlinia,
about 1482]
ia00004000 [London]: Richard Pynson, 9 Oct. 1499
ia00004500 [London]: Richard Pynson, 9 Oct. 1499
ia00005000 [London]: Richard Pynson, '9 Oct. 1499' [about
1503]
ia00005500 [The Netherlands: Prototypography, about
1465-80]
ia00008000 Venice: Franciscus Lapicida, 20 Oct. 1494
The output of the further manipulation here appears as follows in eight different fields:
istc_number imprint city printer avg_year first_year last_year flags
ia00000500 [Spain or Portugal: Printer of Alfasi's Halakhot,
before 1492?] Spain or Portugal Printer of Alfasi's
Halakhot 1492 1492 1492 ---
ia00001000 Westminster: Wynkyn de Worde, [about
1496] Westminster Wynkyn de Worde 1496 1496 1496 ++-
ia00001500 Westminster: Wynkyn de Worde, [about
1497] Westminster Wynkyn de Worde 1497 1497 1497 ++-
ia00002000 Westminster: Wynkyn de Worde, [about
1500] Westminster Wynkyn de Worde 1500 1500 1500 ++-
ia00003000 [London: John Lettou and William de Machlinia,
about 1482] London John Lettou and William de
Machlinia 1482 1482 1482 ---
ia00004000 [London]: Richard Pynson, 9 Oct.
1499 London Richard Pynson 1499 1499 1499 -++
ia00004500 [London]: Richard Pynson, 9 Oct.
1499 London Richard Pynson 1499 1499 1499 -++
ia00005000 [London]: Richard Pynson, '9 Oct. 1499' [about
1503] London Richard Pynson 1503 1503 1503 -+-
ia00005500 [The Netherlands: Prototypography, about
1465-80] The
Netherlands Prototypography 1473 1465 1480 ---
ia00008000 Venice: Franciscus Lapicida, 20 Oct.
1494 Venice Franciscus Lapicida 1494 1494 1494 +++
Appendix 4: An approach to counting incunables using the IISTC
§ 20 Turning the IISTC's
Locations field into a numerical count of
surviving copies presents new challenges, as the format for
recording copies varies considerably between regions. American,
German, and Italian libraries are always divided by semicolons;
Belgian and Other libraries usually appear as City, First
Library, Second Library; Dutch, Spanish, and most Other
European libraries appear as City First Library, Second
Library; and French and British records mix both
formats. In addition, one hopes but can never be sure that the
frequent records describing a library's holdings of a given
incunable as (3, 1 defective)
consistently mean
three copies, of which one is defective
rather
than 3 complete copies plus one defective one
.
Perfect accuracy in automatically counting the IISTC's countless
incunables may not be possible, but a high degree of accuracy
(verified by comparing computer-generated results with
old-fashioned tabulation) is achievable and sufficient for
answering many questions and for helping to formulate others.
§ 21 For counting the number of extant copies in the IISTC, the process is broken down into two steps for sake of simplicity. First, a simple script-or rather, ten minor variations on a simple script-are used to extract only the relevant data from the full export of IISTC records. The following script searches out only copies in American libraries:
$batch="istc.txt"; #name of file to search
open BATCH, $batch or die "Cannot open $batch for
read:$!";
while (〈BATCH〉) {
if (/^[ ]*USA:\t(.*?)$/) {
$match = $1;
$hit=1;
}
if (/^ISTC.*(i.\d{8})/ and ($hit == 1)) {
$hit = 0;
$istc_number = $1;
print "$istc_number\t$match\n";
}
}
The output of this script is a long list of ISTC numbers and the libraries in which copies of the relevant incunable can be found:
ia00000500 JTSL (1 leaf)
ia00001000 PML
ia00002000 FolgSL; PandJG
ia00003000 AmBML; Harv(L)L; LC(L); NewL (-); PML
ia00004000 Harv(L)L; LC; PML; UPaL; YU(B)L; EHLS (sold
1981)
ia00005000 Harv(L)L; HEHL; LC(L)
ia00008000 CPhL; Harv(M)L; PML
With minor variations in the fourth line of the script, similar files can be created for the other locations by which the IISTC organizes its copy attestations: Belgium, British Isles, Other European, France, Germany, Italy/Vatican, Netherlands, Spain/Portugal, and Other. For Spain/Portugal, for example, the output begins:
ia00008000 Avila BP
ia00009200 Barcelona BCatal, BU; Córdoba BP; Madrid BN, BU;
Sevilla Colombina, BU; Toledo BP; Vigo Massó; Lisboa BN
ia00012000 Avila BP
ia00014400 El Escorial RMon
ia00016500 Córdoba BCap
ia00017000 Sevilla Colombina; Coimbra BU
The list of libraries in each location that own a given incunable is useful information that can be imported as ten new fields into the database as described in appendix 1. What would also be useful, however, is if we had a count of copies in a particular location that can be easily summed to provide a worldwide incunable count (as far as the IISTC is concerned, at least). The following script provides just such a functionality for American, Italian, and German libraries. This script is invoked a bit differently than the preceding scripts, in that it expects two command-line arguments: the name of the file to be processed and the name of the file to be written. If this script were given the name count1.pl, it might be invoked as follows to read from the file usa-libraries.txt and create the file usa-count.txt:
perl count1.pl usa-libraries.txt usa-count.txt
The script is as follows:
# script to process library-output
# files for consistently
# semicolon-delimited countries: USA,
# Italy, Germany
$in=shift; # take input file from command line
$out = shift; # take output filename from command line
open IN, $in or die "Cannot open $in for read:$!";
open OUT, ">$out" or die "Cannot open $out for
write:$!";
print OUT "istc_number\tlocations\tcount\n";
while (<IN>) {
$copycount=0;
/^(i.\d{8})\t(.*)$/;
$istc_number=$1;
$locations=$2;
@libraries=split /;/, $locations;
foreach $library (@libraries) {
while ($library=~/\((?:\D|\d+[^,])[^\(]*?\)/) {
$library=~s/\((?:\D|\d+[^,])[^\(]*?\)//g;
}
#get rid of nested parentheses
$library=~s/\((\d{1,2})[^\(]*\)/\(\1\)/g;
#replace (3, 1 torn) with (3)
if ($library=~/\((\d{1,2})\)/) {$copycount+=$1} else
{$copycount++}
}
print OUT "$istc_number\t$locations\t$copycount\n";
}
The output of this script includes column headings. For German libraries, for example, it begins:
istc_number locations count
ia00008000 Bamberg SB; München BSB 2
ia00009100 Gotha ForschLB; Tübingen UB 2
ia00009200 Augsburg SStB; Bamberg SB; Berlin SB; Darmstadt
LHSB; Freiburg i.Br. UB; Giessen UB; Göttingen SUB; Heidelberg
UB; Karlsruhe BLB; Mainz StB; München BSB (3); München UB;
Passau SB; Würzburg UB (2) 17
ia00009300 Frankfurt(Main) StUB (imperfect) 1
ia00009900 Hannover KestnerM 1
The IISTC's variability in recording copies requires the script to be adapted for other locations, however. The next two scripts are minor variations on the preceding one. The first addresses locations such as Belgium that insert a comma between the name of the city and the libraries owning a particular incunable:
# script to process library-output files for
# countries delimited as City, Library1, Library2:
# Belgium, Other [usually]
$in=shift; # take input file from command line
$out = shift; # take output filename from command line
open IN, $in or die "Cannot open $in for read:$!";
open OUT, ">$out" or die "Cannot open $out for
write:$!";
print OUT "istc_number\tlocations\tcount\n";
while (<IN>) {
undef @cities;
$copycount=0;
/^(i.\d{8})\t(.*)$/;
$istc_number=$1;
$locations=$2;
@cities=split /;/, $locations;
foreach $city (@cities) {
while ($city=~/\((?:\D|\d+[^,])[^\(]*?\)/) {
$city=~s/\((?:\D|\d+[^,])[^\(]*?\)//g;
}
#get rid of nested parentheses
$city=~s/\((\d{1,2})[^\(]*\)/\(\1\)/g;
#replace (3, 1 torn) with (3)
undef @libraries;
if ($city =~ /,/) {
@libraries=split /,/, $city;
$null = shift @libraries;
} else {$libraries[0] = $city}
foreach $library (@libraries) {
if ($library=~/\((\d{1,2})\)/) {
$copycount+=$1;
} else {$copycount++}
}
}
print OUT "$istc_number\t$locations\t$copycount\n";
}
The next script is for locations such as Spain/Portugal that separate libraries within a single city from each other with commas, but without a comma after the name of the city:
# Script to process library-output files for countries
# delimited as City Library1, Library2: Other Europe,
# Spain, Netherlands, France (mostly), Britain (usually)
$in=shift; # take input file from command line
$out = shift; # take output filename from command line
open IN, $in or die "Cannot open $in for read:$!";
open OUT, ">$out" or die "Cannot open $out for
write:$!";
print OUT "istc_number\tlocations\tcount\n";
while (<IN>) {
undef @cities;
$copycount=0;
/^(i.\d{8})\t(.*)$/;
$istc_number=$1;
$locations=$2;
@cities=split /;/, $locations;
foreach $city (@cities) {
while ($city=~/\((?:\D|\d+[^,])[^\(]*?\)/) {
$city=~s/\((?:\D|\d+[^,])[^\(]*?\)//g;
}
#get rid of nested parentheses
$city=~s/\((\d{1,2})[^\(]*\)/\(\1\)/g;
#replace (3, 1 torn) with (3)
undef @libraries;
@libraries=split /,/, $city;
foreach $library (@libraries) {
if ($library=~/\((\d{1,2})\)/) {
$copycount+=$1;
} else {$copycount++}
}
}
print OUT "$istc_number\t$locations\t$copycount\n";
}
As explained above, the IISTC contains some records that are truly ambiguous as to the number of copies in question, and for some locations the formatting is inconsistent. In the case of inconsistent formatting, some further refinement can help reduce the inaccuracies. It is undoubtedly useful for the staff of the British Library for their copies to appear at the head of the list of libraries in the British Isles rather than with other London libraries, and with signatures of all their copies; however, for attempting a count based on this data, it is distinctly annoying. Consider the following data:
ia00017000 London BL, 167.f.13 = IB.27036; Chatsworth;
Edinburgh NLS (Inc.207); Oxford Bodley (2), Magdalen, Pembroke
(2) Colleges; Stonyhurst College
ia00018600 Cambridge, Trinity Hall; Oxford Bodley, All Souls
College
ia00020500 London BL, IC.28708; Barnard Castle, Bowes
Museum
ia00021000 Cambridge, Trinity Hall; Oxford, New College
How is a computer to know that Oxford Bodley, All Souls
College
refers to two copies, while Oxford, New
College
refers to just one? The assumption that a comma
divides a city and its libraries must be modified with an
explicit statement that Oxford is a city, as the following script
attempts to implement. As a consequence, the anomalous recording
of British Library copies results in an overcount by one that
must be individually corrected.
# script to process library-output files
# for British Isles.
# Remove nested parentheses first, to get
# rid of semicolons within comments, and
#then split on semicolons; if there is
# a 'London, BL', remove one from the total count
$in="libs-brit.txt"; # input file
$out = "bricount.txt"; # output file
open IN, $in or die "Cannot open $in for read:$!";
open OUT, ">$out" or die "Cannot open $out for
write:$!";
print OUT "istc_number\tlocations\tcount\n"; # add column
heads
while (<IN>) {
undef @cities;
$copycount=0;
/^(i.\d{8})\t(.*)$/;
$istc_number=$1;
$locations=$2;
# get rid of (digit) in BL signatures
$fixlocations=$locations;
while ($fixlocations=~/[^ ]\(\d\)/) {
$fixlocations=~s/[^ ]\(\d\)//g;
}
# get rid of nested parentheses
while ($fixlocations=~/\((?:\D|\d+[^,])[^\(]*?\)/) {
$fixlocations=~s/\((?:\D|\d+[^,])[^\(]*?\)//g;
}
@cities=split /;/, $fixlocations;
foreach $city (@cities) {
$city=~s/\((\d{1,2})[^\(]*\)/\(\1\)/g;
# replace (3, 1 torn) with (3)
$city=~s/\(\d{1,2} lea[^\(]*\)//g;
# eliminate e.g. (3 leaves)
if ($city=~/London BL,[^,]* and /) {$copycount++}
# correct for multiple BL signatures without
# comma dividers
$city=~s/(London|Oxford|Cambridge|Manchester|Dublin|Durham|Hereford|Edinburgh|Cashel|Guernsey|Coleraine|Barnard
Castle|Parkminster|Northampton|Reigate|
Birmingham|Canterbury|Harpenden|Brasenose|Killiney),/\1/;
# eliminate commas after city names
undef @libraries;
@libraries=split /,/, $city;
foreach $library (@libraries) {
if ($library=~/\((\d{1,2})\)/) {
$copycount+=$1;
} else {
$copycount++;
}
if ($library=~/London BL/) {
$copycount--;
}
}
}
print OUT "$istc_number\t$locations\t$copycount\n";
#print "$istc_number\t$locations\t$copycount\n";
}
Some sample output illustrates that Perl can deal with a great
deal of discrepancy in formatting and still arrive at a correct
count, while the question of what ownership of a
copy
of an incunable really means is a
separate issue entirely:
ia00425700 Oxford Bodley 1
ia00426000 London BL, IB.21897 (Acquisition 1985, not in BMC.
Bound with Nicolaus Perottus, Rudimenta grammatices, Lyons,
anonymous press (IB.21897) and Aelius Anthonius Nebrissensis,
Introductiones Latinae, Logroño, Arnao de Brocar, 1510. In a
Spanish binding); Oxford Bodley 2
ia00426300 Cambridge, St John's College (2 ff.) 1
ia00426500 London BL, IB.21851 1
ia00426600 London BL, Harl.5918(2) = IA.49742 (Colophon only,
in the Bagford Collection) 1
ia00426700 Cambridge UL (imperfect, wants a2-7 and all after
K6); Oxford Bodley (fragment consisting of ff. f1,6, quire E
and ff. I2-5) 2
ia00428000 London BL, IA.20854 (Imperfect, wanting leaf g7
and sheets h4, i4) 1
Acknowledgements
The author wishes to thank Alvan Bregmann, Bryce Inouye, and Paul Needham for their kind assistance and helpful suggestions for this article.
Works cited
The British Library. 1998. The illustrated ISTC on CD-ROM. 2nd ed. London: Primary Source Media, in association with the British Library.
Copinger, Walter Arthur. 1895-1902. Supplement to Hain's Repertorium bibliographicum: or, collections toward a new edition of that work. London: H. Sotheran.
Davies, Martin and John Goldfinch. 1992. Vergil: a census of printed editions 1469-1500. Occasional Papers of the Bibliographical Society 7. London: The Bibliographical Society.
Gesamtkatalog der Wiegendrucke. 1925-. 10 vols. to date. Stuttgart: Hiersemann.
Hain, Ludwig. 1826-1838. Repertorium bibliographicum, in quo libri omnes ab arte typographica inventa usque ad annum MD. typis expressi, ordine alphabetico vel simpliciter enumerantur vel adcuratius recensentur. 2 vols. Stuttgart: J. G. Cotta.
Neddermeyer, Uwe. 1998. Von der Handschrift zum gedruckten Buch: Schriftlichkeit und Leseinteresse im Mittelalter und in der frühen Neuzeit. Quantitative und qualitative Aspekte. Buchwissenschaftliche Beiträge aus dem deutschen Bucharchiv München 61. 2 vols. Wiesbaden: Harrassowitz.
Needham, Paul. 1999. Counting incunables: the IISTC CD-ROM. Huntington Library Quarterly 61: 457-529.
Ohly, Kurt and Vera Sack. 1966-1967. Inkunabelkatalog der Stadt- und Universitätsbibliothek und anderer öffentlicher Sammlungen in Frankfurt am Main. Frankfurt: Klostermann.
Reichling, Dietrich. 1905-1911. Appendices ad Hainii-Copingeri Repertorium bibliographicum. 7 vols. Munich: Rosenthal.
Reske, Christoph. 2000. Die Produktion der Schedelschen Weltchronik in Nürnberg. Wiesbaden: Harrassowitz.