Introduction
§ 1 For half a century scholars of Anglo-Saxon England have had the benefit of an annual bibliography of works in their field compiled by authors who were not only first-rate scholars but meticulous bibliographers. The work began as a mimeographed typescript first circulated by Stanley B. Greenfield in the 1950s and continued by Fred C. Robinson in the 1960s; in 1967 it was incorporated into the Old English Newsletter (OEN), then a new publication sponsored by the Old English Group of the Modern Language Association (on the history of the OEN Bibliography see Berkhout 2004). The OEN Bibliography was continued by Alan K. Brown in 1971, then by Carl T. Berkhout in 1975, who produced it for 25 years with admirable precision and thoroughness. The Bibliography is now produced by Professor Tom Hall of the University of Illinois at Chicago, with the assistance of Professor Melinda Menzer of Furman University.
§ 2 The annual OEN Bibliography has become a reference work of first resort in the field of Anglo-Saxon studies. Unlike other works such as the MLA Bibliography and the ITER online bibliography, the OEN Bibliography is fully interdisciplinary, covering all aspects of Anglo-Saxon history, language and culture, and its focus on one historical period and geographical area gives it a depth and richness unparalleled in modern scholarship. Its history, as Berkhout 2004 recounts, is linked to the annual bibliography which appears in the journal Anglo-Saxon England, but the broader inclusiveness and fuller bibliographic detail found in the OEN Bibliography make it a more complete record of scholarship in the field. While the OEN Bibliography cannot claim complete coverage in some areas (such as archaeology, which is the special concern of the annual British and Irish Archaeological Bibliography, online at http://www.biab.ac.uk/), it does try to cover the whole of Anglo-Saxon studies from manuscripts to linguistics, from numismatics to literary theory, from histories of the discipline to archaeological fieldwork, in a way that is both responsible to individual sub-fields and useful to the broadest range of scholarly interests. Surely there are few Anglo-Saxonists who do not owe a debt of gratitude to this work and to its companion review, the Year’s Work in Old English Studies, published annually in OEN. I think it is safe to say, however, that anyone who uses the bibliography regularly, leafing through piles of back issues in an office or library carrel, has at one point or another dreamed of having it in a comprehensive or compiled version, something like a supplement to the indispensable Greenfield and Robinson 1980.
§ 3 In dreams, it has been said, begin responsibilities. Three years ago I was asked to succeed Jon Wilcox as editor of OEN; in the throes of my early enthusiasm one of the many things I imagined was how useful it would be to have a searchable electronic version of the annual bibliography— more useful, even, than a printed supplement to Greenfield and Robinson 1980. I was delighted to obtain the endorsement and support of Carl Berkhout himself, who had already envisioned such a thing; when I broached the subject with the publisher of OEN, I was told that as Editor I was free to explore any avenues I liked, provided they cost him nothing and involved no work or obligations on his part. In fact I had no more idea how to create such a searchable online bibliography than I had of how to edit a quarterly newsletter; I was completely ignorant in the fields of bibliography, database construction, and all but the rudiments of HTML. But my ignorance of the complexity of such a project gave me the confidence to pursue it. I began to imagine how I might interest someone else in creating such a site: I studied the printed Bibliographies to get a sense of the kinds of material the database would contain; I explored other websites, noting features I liked and disliked; I began to read everything I could about databases and interfaces, so that I might be able to outline the project to a web-design professional. Three years and countless hours later, still waiting for that professional to appear, I have constructed the site myself. The database has finally achieved a stable and workable form, and is now available to the public.
§ 4 In the sections that follow I will move between a description of the project as I now understand it and a narration of the long and messy process by which I came to this understanding. I make no claims for the OEN Bibliography database as a model for other projects—the story of its development may be more cautionary than exemplary—but in the process of creating it I have formed some opinions about how such projects ought (and ought not) to be planned and managed, how a relational database should (and should not) be made, and how a complex online project can gain and lose efficiency at each stage of its implementation. Examples of code (e.g. the search queries in §§19-22) are presented as illustrative only; more explicit and informed tutorials are widely available online.
Database Tables
§ 5 From a conceptual standpoint the OEN Bibliography database consists of three parts: a body of information stored in database tables, a set of search routines used to retrieve this information, and the codes used to display search forms and search results in a browser. Though obviously interdependent, each part has been designed to be separate from the others and thus can be changed without having to reconstruct the whole site or re-enter any data; the data tables can be changed, supplemented, corrected, or reorganized without affecting the search or display routines, and the appearance of the site can be altered without having to rewrite the code used to store or retrieve data. This has been, without a doubt, one of the most valuable insights I have gained from this project, and like most insights it is something of a paradox—one must keep the component parts of a complex system as separate from one another as possible, yet always consider the ways each is dependent on the others.
§ 6 The data itself consists of a number of individual records (more than 17,500 as of January 15, 2006) in a MySQL database table; each record is an entry from the annual OEN Bibliography from 1973 to 2002 (the database will be updated annually as new bibliographies appear). The search routines used to retrieve these items and the code used to display them are written in PHP (http://www.php.net/), a general-purpose scripting language that dynamically generates text and markup which can be embedded into standard HTML. The choice of MySQL/PHP was made more or less fortuitously and motivated largely by cost: both are free and Open Source programs which can be used without the payment of expensive site licenses or fees. There are no doubt other databases and other programming languages which might offer various advantages over MySQL and PHP (PERL is often mentioned as a more general-purpose alternative to PHP; the two languages are compared at http://www.thesitewizard.com/archive/phpvscgi.shtml), but the combination of MySQL and PHP has proved to be powerful and reliable. PHP and MySQL are available for a wide variety of platforms and are almost universally supported on commercial and academic web servers; they are well documented (online documentation can be found at http://dev.mysql.com/doc/ and http://www.php.net/manual/en/index.php), many published guides are available (over the course of the project I found the most consistently helpful to be Greenspan and Bulger 2001, though others—such as Williams, Lane and Oram 2004—are equally useful), and countless websites offer extensive online repositories of code routines, examples, tutorials, and instruction. MySQL databases can easily be managed via phpMyAdmin (http://www.phpmyadmin.net/home_page/index.php), a web-based interface which greatly simplifies the work of creating and editing databases and tables.
§ 7 From the beginning of the project the conceptual model has been the World Shakespeare Bibliography, edited by James L. Harner at Texas A&M University and published by the Johns Hopkins University Press in association with the Folger Shakespeare Library (http://www.worldshakesbib.org/). This site, which grew out of the annual bibliographies in the Shakespeare Quarterly, provided not only a standard of speed and organization to which I could aspire, but also (not always consciously) a template for the appearance of the OEN Bibliography site: a banner above, a search form on the left, search results on the right. It also offered an idea of the kinds of searches that should be available —browsing by subject, quick searches by keyword, and advanced searches by various specific criteria. In order to carry out such searches efficiently, the individual bibliographical records needed to be separated into discrete elements: at the minimum, I realized, the database tables would need separate fields for authors, titles, publication information and subject headings. In other words, every entry in the printed Bibliography would have to be analyzed and deconstructed before it could be inserted into a database table.
§ 8 With this realization in mind, I began to re-examine the different kinds of items found in the OEN bibliographies and compare them to some widely-used formats for digital bibliographic records, expecting that I would quickly find some universally-accepted set of data fields I could use for my database tables. By far the best-known and most widely-used format is the MARC21 system used by the Library of Congress and most US libraries (http://www.loc.gov/marc/); this is a venerable and well-documented standard, but it proved to be a daunting, obscure, and in the end unhelpful model, too complex in some respects and not complex enough in others. A typical MARC record, no doubt familiar to many readers from online university library catalogues, looks something like this:
LDR 01111cam 2200325 a 4500001 1896162
005 19961115104608.7
008 840518s1985 tnu b s001 0 eng
035 $9(DLC) 84011889
906 $a7$bcbc$corignew$d1$eocip$f19$gy-gencatlg
010 $a 84011889
020 $a0870494449 (alk. paper) :$c$8.95
040 $aDLC$cDLC$dDLC
050 00$aPR1588$b.R6 1985
082 00$a829.3$219
100 1 $aRobinson, Fred C.
245 10$aBeowulf and the appositive style /$cby Fred C. Robinson.
260 $aKnoxville :$bUniversity of Tennessee Press,$cc1985.
300 $a106 p. ;$c23 cm.
440 4$aThe Hodges lectures
504 $aBibliography: p. 84-103.
500 $aIncludes index.
630 00$aBeowulf.
650 0$aEnglish language$yOld English, ca. 450-1100$xApposition.
650 0$aEnglish language$yOld English, ca. 450-1100$xStyle.
650 0$aEpic poetry, English (Old)$xCriticism, Textual.
650 0$aChristianity in literature.
650 0$aPaganism in literature.
650 0$aRhetoric, Medieval.
991 $bc-GenColl$hPR1588$i.R6 1985$tCopy 1$wBOOKS
§ 9 The complex numerical codes and abbreviations seemed needlessly abstract for my purposes (not to mention intimidating to a bibliographical novice). And as I came to understand the structure of the MARC system a little better, I became aware of its inherent limitations —or rather, of the fact that its structure reflects its purpose, which is primarily to help put books on library shelves. On the one hand, some of the information found in a typical MARC record (the book’s ISBN number, physical dimensions, and Library of Congress call number) was not included in the OEN bibliographies. On the other hand, while MARC records are very good at describing individual books or monographs, there is no easy way to create records for articles in journals or essays in collections, yet these comprise the majority of the entries in the OEN Bibliography.
§ 7 Equally uninspired by other available bibliographic standards, and eager to
begin my project, I devised my own set of fields for the MySQL table that would
contain the bibliography items. My first effort yielded separate fields for
names of up to six authors (last and first names and additional
parts
of names, at this point not distinguishing between name
elements like Jr.
and authorial functions like ed.
or trans.
), titles, journal-number-pages or
publisher-place-pages, year of publication, language, type of item, ‘notes’ (a
catch-all field in which I placed anything that seemed not to fit elsewhere),
and two subject headings. This seemed adequate at the time, but it became
increasingly unwieldy as the variety of items in the database grew and the
amount of specificity required of the searches increased.
§ 10 In retrospect I clearly suffered from my lack of experience
in this task, and my failure to consult those more expert in such matters; a
great deal of energy and time went into developing a system whose limitations
should have been obvious to me. Whether inserting items in the database or
writing search routines, I found that small details could become big problems:
an item’s year of publication, for example, is not always the year listed on a
journal’s masthead; a book might belong to two series at once, or be published
both as a special number of a journal and a separately-titled volume. I had not
included any easy way to name the author of only part of a book, such as a
Foreword or Introduction. Problems arose from the choices I had made for
encoding information in the database —or, to be honest, the choices I had not
known I was supposed to make. Non-Roman characters were especially frustrating
until I finally converted all tables and HTML pages to UTF-8 encoding. Inserting
HTML tags to generate italics within article titles made a more attractive
display, but also made it harder to search for these titles —Ruth Wehlau’s essay
The Power of Knowledge and the Location of the Reader in
Christ and Satan
(JEGP 97 [1998]: 1-12), for example, would not be found by searching
for the reader in Christ and Satan because the title field of the
database actually reads The Power of Knowledge and the Location of the
Reader in <em> Christ and Satan </em>
. Every gain in sophistication, it seemed, resulted in a loss of
efficiency or functionality.
§ 11 It was not long before I realized that I had designed a
small-scale system for a large-scale project. Having a much better sense both of
the diversity of information contained in the Bibliography’s entries and of the nature of MySQL search routines,
I grew increasingly concerned about the makeshift quality of my data tables,
about my lack of attention to standard bibliographic formats, and, looking
ahead, about the possibility of exchanging data between the OEN
Bibliography database and others, which seemed highly unlikely
unless the structure of my database tables could be made to conform to some
accepted standard. I began a more thorough investigation of bibliographic
formats and eventually discovered the Library of Congress’ Metadata
Object Description Schema
(MODS; http://www.loc.gov/standards/mods/), an XML schema for electronic
bibliographies. The system is similar in some respects to the MARC format, but
far more flexible, more easily understandable, and more easily adapted to
different types of bibliographic items, including journal articles and essays in
collections. A typical MODS record is as follows (the example is taken from http://www.loc.gov/standards/mods/v3/mods-userguide-examples.html):
<titleInfo>
<title>Hiring and recruitment practices in academic libraries</title>
</titleInfo>
<name type="personal">
<namePart>Raschke, Gregory K.</namePart>
<displayForm>Gregory K. Raschke</displayForm>
</name>
<typeOfResource>text</typeOfResource>
<genre>journal article</genre>
<originInfo>
<place>
<placeTerm type="text">Baltimore, Md.</placeTerm>
</place>
<publisher>Johns Hopkins University Press</publisher>
<dateIssued>2003</dateIssued>
</originInfo>
<language>
<languageTerm authority="iso639-2b">eng</languageTerm>
</language>
<physicalDescription>
<form authority="marcform">print</form>
<extent>15 p.</extent>
</physicalDescription>
<abstract>
Academic libraries need to change their recruiting and hiring procedures … [omitted] … innovative concepts from modern personnel management literature.
</abstract>
<subject>
<topic>College librarians</topic>
<topic>Recruiting</topic>
<geographic>United States</geographic>
</subject>
<subject>
<topic&ggt;College librarians</topic>
<topic>Selection and appointment</topic>
<geographic>United States</geographic>
</subject>
<relatedItem type="host">
<titleInfo>
<title>portal: libraries and the academy</title>
</titleInfo>
<originInfo>
<issuance>continuing</issuance>
</originInfo>
<part>
<detail type="volume">
<number>3</number>
</detail>
<detail type="level">
<number>2</number>
</detail>
<extent unit="page">
<start>53</start>
<end>67</end>
</extent>
<date>Jan. 2003</date>
</part>
</relatedItem>
</mods>
§ 12 I am not ashamed to admit that the immediate appeal of this
system lay in the fact that I could understand its XML tags simply by reading
them; a closer study of the system convinced me that it also offered all the
flexibility, specificity, and portability that I would need. Adopting the
organizational structure of MODS, even if it meant a wholesale revision of the
project, would mean that entries in the OEN
Bibliography database could, in theory, be exported to other systems
via the common ground of the MODS schema. The MODS system served more as a model
of organization than an entirely new container for my data; some features, such
as the ad libitum repeatability of fields such
as <name> and <subject> , were impossible in a
database, while others, such as the specification that a journal is
print
and continuing,
or the
separation of page numbers into start
and
end
, seemed unnecessarily precise considering the
nature of the items found in the OEN Bibliography.
Nevertheless, the present fields in the database tables can be translated into a
MODS schema with relative ease. I rebuilt the database tables according to the
following structure, and over the course of a tedious month transferred all
entries into the new tables:
type: type of item: journal article, essay in a collection, monograph, edition, facsimile, etc. Each is represented by a two-letter abbreviation: ‘es’ for an article in a journal, ‘ec’ for an article in a collection, etc.
name1fam: surname of the first named author
name1giv: given name of the first named author
name1add: added part of name, e.g. Jr. or III (similarly for five additional names)
altName: the names of any persons in the item, presented without accents or special characters, e.g. Andre Crepin for André Crépin. This makes it possible to search for such names without having to type any special characters (this feature was suggested to me by Lilla Kopár).
role: the relation of the names to the work, i.e. as author, editor, translator, etc. If there are several named persons with different roles, the field is subdivided and the distribution of roles is indicated as, e.g.
01author|02editor
title: the title of a work (whether a book or journal article)
altTitle: the title without HTML tags, used for phrase searching
subtitle: used only for books
host title: name of journal or volume in which an item appears
host subtitle: used only for anthologies and collections
host editors: used if a work appears in an edited collection
partsAuthors: used if the work has other persons associated with it as authors of Forewords, Introductions, etc.
series name: for books in a series
series number
journal number
place: place of publication for books or collections
publisher
place2: used when a work has two listed places of publication
publisher2
dateIssued
date created
edition: used to indicate 2nd or revised editions, etc.
extent: page numbers of articles, or number of pages in a book
language
note: used for translated titles of foreign works, special comments on an item, etc.
identifier: used to cross-reference essays in collections to one another and to the collection, if it is listed separately. May be used in future for ISSN or ISBN numbers.
subject1
subject2
subject3
subject4
keywords: additional keywords under which the item might be sought; useful when neither the title nor the subject headings offer enough specific information about the subject of the work.
§ 13 Obviously no item in the database uses all these fields,
but I had begun to realize that it is wise to parse data as finely as possible,
separating items into the largest number of fields I could imagine, and to use
each field for only one kind of information. This allows the data to be searched
more quickly and with greater precision; it is also easier to reassemble several
fields into one string of text when a record is retrieved than to search for a
sub-string in a larger data field. When a record is retrieved, a relatively
simple PHP routine reassembles the fields into a standard bibliographic entry:
it instructs the display to print the author’s last name, then a comma, then the
first name; if there is no second name add a period; if there is a second name
but no third name, add the word and
and the second given name and
surname; put quotation marks around journal articles and HTML italic tags around
book titles; and so on. The result is a text string basically identical to that
found in the printed Bibliography.
§ 14 The first stage of the project, before I understood any of this, involved getting the data into digital form. Some of the annual Bibliographies were available as MS Word documents from Carl Berkhout’s website at the University of Arizona (http://www.u.arizona.edu/~ctb/; click on the cat to find them); the rest needed to be scanned from copies of OEN and turned into text files with OCR software (I used Readiris 9, one of the few options available for Mac OS X). In all cases some alteration to the entries was necessary—the short titles used to refer to frequently-cited items needed to be expanded, the long dashes used to avoid repeating an author’s name in consecutive entries had to be removed, HTML tags for italics had to be added, missing elements such as the publisher’s name had to be located and supplied, and abbreviations had to be regularized. The resulting bibliographies were then saved as a series of plain-text files.
§ 15 Even with the data in electronic form, however, it was obvious that I faced a formidable task; far more difficult than devising MySQL search queries to retrieve items from the database (on which see §§19-22 below) was the creation of a routine to put the items into the database in the first place. Consider these two fairly typical items, taken at random from the 2001 Bibliography:
Schwab, Ute. "Die beiden 'Runenglossen' im deutsch-insularen Gregorius-Homiliar Clm 3731 (saec. VIII ex.)." Mittelalterliche volkssprachige Glossen: Internationale Fachkonferenz des Zentrums für Mittelalterstudien der Otto-Friedrich-Universität Bamberg 2. bis 4. August 1999. Ed. Rolf Bergmann, Elvira Glaser, and Claudine Moulin-Fankhänel. Heidelberg: C. Winter, 2001. 77-100.
Schwyter, Jürg Rainer. "OE Conjunction þeah (þe): Law II Cnut 72.1 and II Cnut 75." Neophilologus 85 (2001), 291-96.
§ 16 Any competent reader will instantly interpret the bibliographical conventions used here, but to a computer these are just strings of characters. The task at hand was to parse each item into sub-strings to be assigned to the appropriate fields in the database table. In these two examples the parsing would have to produce the following:
Schwab | Ute | Die beiden 'Runenglossen' im deutsch-insularen Gregorius-Homiliar Clm 3731 (saec. VIII ex.) | Mittelalterliche volkssprachige Glossen | Internationale Fachkonferenz des Zentrums für Mittelalterstudien der Otto-Friedrich-Universität Bamberg 2. bis 4. August 1999 | Ed. Rolf Bergmann, Elvira Glaser, and Claudine Moulin-Fankhänel | Heidelberg | C. Winter | 2001 | 77-100
Schwyter | Jürg Rainer | OE Conjunction <em>þeah (þe)</em>: Law II Cnut 72.1 and II Cnut 75 | Neophilologus | 85 | 2001 | 291-96
§ 17 In addition, information would need to be added to every entry: subject headings, alternate spellings of names containing non-Roman characters, information on the type and language of the item, and so on. If I had had an army of graduate students and the temperament of a Pharaoh, I might have been tempted to assign the work to be done by hand; working alone, however, with no budget, I needed some sort of routine that could slice bibliographical entries, no matter how diverse or complex, into their constituent parts and place each of them into the appropriate field of a database table.
§ 18 Ultimately, the consistency of citation in the published
bibliography made it possible to write a set of PHP functions to do just that.
Given a few basic hints (the type of item, the number of authors, the language),
the parsing routine reads in a line from the bibliography text file and scans it
for certain characteristic markers —quotation marks around article titles,
colons before publishers’ names, parentheses around the dates of journal
articles, the words Ed.
after a title, and so on. It uses these
markers to subdivide the string into smaller and smaller elements; the resulting
information is then saved as a MySQL INSERT query in another text file. The
parser also checks journal and series titles for consistency, flagging any minor
but critical errors of reference (such as calling a journal Econ. Hist. Rev. one year and Economic Hist.
Review the next). My own work consists largely in formatting the
text files, providing the initial hints, assigning subject headings, pressing
the Submit
button, and proofreading the output before it
goes into the query file; with more than 750 items for each year of the Bibliography, and a new set of items appearing every year,
this has proved to be more than enough.
Search Routines
§ 19 As I have noted, the items in the database are theoretically separate from the search routines used to retrieve them; even the major changes introduced into the structure of the database tables required only minor adjustments to the search routines, and wholesale revision of the searches requires no alteration to the data tables. Compared to the problems of designing appropriate tables for database items, the problem of writing search routines seemed relatively minor—a number of highly explicit guides are available for creating such routines, and those used in the OEN Bibliography database are closely modeled on the public-domain work of others. In retrospect, however, it was a mistake to work from database forward rather than from the user backward; the problems of database design and information storage seemed more pressing at the time than issues of intelligibility and ease of use, but these have in fact proved to be far more subtly intractable. Data and its retrieval are never entirely separate in practice; each must be conceived as a complement to the other. As a general principle I now believe that a database designer should begin by imagining the user and the various ways the information will be accessed, and then create the structures and routines that will make these possible— moving as it were from the visible to the invisible, from the search and result screens to the data and its storage (my thinking on web design has been dependent on such works as Krug 2005, Nielsen 2000, on the various articles on Nielsen’s website at http://www.useit.com/, and on the Web Design and Usability Guidelines of the U.S. Department of Health and Human Services at http://usability.gov/).
§ 20 The OEN Bibliography database can
be searched in three different ways, each with its own advantages and
disadvantages. Certain options are common to all search modes, such as the
ability to limit searches to one year or a range of years, to sort by date,
author, or title, and to specify how many items should be displayed on each
screen. In imagining the search routines I assumed that the most obvious one
from a user’s point of view would be a browse
mode, which
presents a list of subject headings. This is, after all, how the material
appears in the printed Bibliography. The annual OEN Bibliography has relatively few subject headings; the
online database offered the chance to create much a more precise system of
classification, something more than a broad list of subject areas but less than
a full index of topics. Existing systems such as the Library of Congress Subject
Headings seemed to fit uncomfortably on the items in the database, so—once again
—I began to create my own. I imagined that a hierarchical system, proceeding
through subheadings such as History and Culture > 10th century >
Æthelstan
or Archaeology > towns and large settlements
> London
, would be the most logical; the user would be able to
browse subjects at any level of specificity —viewing, for example, everything
written about towns and large settlements
in general as well as
those items about London in particular. I hoped these would serve as a makeshift
substitute for the full analytical subject index
requested
by Carl Berkhout in his description of his own conception of the project
(Berkhout 2004: 111). The database allows each item to have up to four different
subject headings; the browse
search screen lists each level
of subject headings, allowing the user to click on the heading and retrieve all
items that contain that heading in any one of its four subject fields. The
search query to retrieve these items is straightforward (in the example below,
$search is replaced by the subject heading sought; % indicates that any number
of characters, or none, may follow the search string, and * instructs MySQL to
return the entire record for each item found):
or subject2 LIKE '$search%' or subject3 LIKE '$search%'
or subject4 LIKE '$search%');
§ 21 As it turns out, however, the browse
function appears not to be used very often by visitors to the site. Access logs
kept by the host server indicate that most users use the database to find
specific titles, or search for topics by keyword (which, since it searches
subject headings as well as titles and authors, provides in effect a kind of
shortcut to a subject listing). Moreover, the subject headings are variously
inadequate; they reveal too clearly my own limitations (as in the disorganized
system of subject headings for art
) or lack of foresight
(as in the uselessly sparse subject headings for language
),
and some of the subject classifications strike even me as idiosyncratic. The
database urgently needs a thorough revision of its subject headings, guided by
the advice of experts in the various sub-fields covered by the bibliography. I
still believe, however, that the browse
function is a
useful one; it allows a casual user to gain an overview of work on a given topic
with relative ease, or a researcher to quickly retrieve all items written on a
given text over the past thirty years.
§ 22 By far the most complex search routine is the one with the
simplest interface, the keyword
search. Presented with a
plain text box, the user can type in any number of search terms, using common
search-engine prefixes such as + for required terms, -
for excluded terms, | for optional terms, and quotation marks for
specific phrases. The keyword
search looks for matches in
subject fields, names, and titles. MySQL allows full-text searching with a
relatively simple query such as SELECT DISTINCT * FROM 'entries' where
match(field1, field2, field3, etc.) against ('keyword1 keyword2 keyword3
etc.')—this instructs the database to return all items in which any
of the listed fields matches the keyword strings. But the number of fields to be
searched, and the small size of each field, made such a full-text search
impractical for the Bibliography database. Instead, a
PHP function generates a search query by taking the string of text entered by
the user, breaking it into an array of individual terms or phrases, examining
each member of the array for prefixed signs indicating AND,
OR, or NOT conditions, and creating a MySQL query
of unpredictable complexity across the required fields of each entry. A query
such as +a |b |c -d, for example (meaning find all entries
containing term a but not term d, and either term
b or c
), generates a query as follows, in
which the nested parentheses do much of the tricky work of inclusion and
exclusion (in the interest of space most of the search fields—more than fifteen
for each search term—have been omitted; their absence is indicated by
ellipses):
SELECT DISTINCT * FROM 'entries' WHERE (subject1 LIKE 'a' OR subject2 LIKE 'a' OR subject3 LIKE 'a' … OR keywords LIKE 'a' ) AND ( ( subject1 NOT LIKE 'd') AND (subject2 NOT LIKE 'd') AND (subject3 NOT LIKE 'd') … AND (keywords NOT LIKE 'd') ) AND ( ( subject1 LIKE 'b' OR subject2 LIKE 'b' OR subject3 LIKE 'b' … OR keywords LIKE 'b' ) OR ( subject1 LIKE 'c' OR subject2 LIKE 'c' OR subject3 LIKE 'c' … OR keywords LIKE 'c' ) )
§ 23 This strikes me as something of a brute-force method, and I am not entirely happy with the keyword function; it is a tribute to the speed and efficiency of MySQL, however, that even a tortuously complex query like this one takes only a second or two to search over 17,500 records and return the correct set of results.
§ 24 The most powerful and flexible search mode is the
advanced
search. This is a fairly straightforward
search, with form fields corresponding closely to database fields, but it
presented several challenges and offered several opportunities. The OEN Bibliography records items as they are published, so
that articles by the same author may appear under different names, such as
Roy M. Liuzza
or Roy Michael Liuzza
or
R. M. Liuzza
. Searching for Liuzza would find
all these, but searching for Roy and Liuzza would find
only some of them. Whenever a first name is specified, then, the
advanced
search adds a small routine to search for
EITHER the whole name OR the first initial. A limited
keyword
function enhances the specificity of a search,
but is not meant to duplicate the functions of the keyword
search. Drop-down menus for Type of Item
and
Language
allow one to find, for example, all
translations, or all editions in French, or all works published in Russian in
the 1980s. The Journal
field, which allows a user to find,
for example, all articles published in Anglo-Saxon
England in a given year (or within any range of years), also allows
access (by clicking on the word Journal
in the search form)
to a complete alphabetical list of all the journals indexed in the database,
with links to and information about online sites where available. This list,
like the master list of subject headings, is kept in a text file for faster
reference; a PHP routine reads the text file into a large array and displays
only the parts requested, either alphabetically or by search string. The
advanced
search is also the only search mode which
retrieves reviews independently of the items they are reviewing; these are
listed after other search results.
Display and Design
§ 25 In the search routines the constant goal has been to make them smaller and more efficient. In the display of results, however, it again becomes important to consider the user’s experience of the site, and to present pages that are easy to read and understand despite their density of information. In this, as in the search routines, PHP has been a useful scripting language. PHP routines not only submit queries to the database and retrieve the results, they reassemble the results back into standard bibliographical formats and display them on the screen, numbered and linked and subdivided into pages.
§ 26 The display of search results is handled similarly for all search modes by a call to a single PHP function. The results of a search query may contain any number of individual records; the function calls up each record, reassembles its fields into a bibliography entry (following as far as possible the guidelines of the Chicago Manual of Style), and adds the resulting text string to an array. Each member of the array can then be displayed on the screen. Behind the scenes of this process are several other activities, including the creation of links for saving an item or showing a record in more detail.
§ 27 The Detail
window, which essentially
presents a stripped-down version of the database table for an item, has a number
of features not available from the screen showing search results. Authors’ names
and subject headings are made into hyperlinks which launch a search query for
other works by that author or on that subject. Journal titles (which are often
abbreviated) are also hyperlinks, which when clicked supply additional
information about the journal, including its full title, ISSN number,
information about online content and a link to its homepage. If an item is an
essay in a collection, a hyperlink retrieves other items from that collection or
information about the volume itself, if it is catalogued separately in the
database; records for books include a list of reviews. Records in the
Detail
window can be emailed or printed.
§ 28 The use of this Detail
window
function, however, raised a problem of design standards. As a general rule,
links which open pop-up windows (and the JavaScript routines required to create
them) are deprecated by those web designers who advocate strict compliance with
W3C standards (the World Wide Web Consortium publishes guidelines for many
languages at http://www.w3.org/; a series of arguments against
the use of pop-up windows can be found at http://www.sitepoint.com/article/beware-opening- links-new-window); presumably because pop-up windows are annoyingly overused by advertisers,
many browsers allow users to block pop-ups entirely. The OEN
Bibliography site has tried to comply with recommended design
standards as much as possible, but in this case the usefulness of a small window
to display detailed information and, if desired, conduct collateral searches
seemed worth the violation of stringent design principles.
§ 29 Any (or all) results of a search can be saved to a list. This is another table in the database, designed so that any item saved there is cleared after two days (to avoid overloading the table with undeleted saved items). The table holds only the user’s login name, the date, and the reference numbers of saved items; these items are retrieved and displayed like regular search results, but again in a pop-up window rather than the main area for search results. The list of saved items may also be printed or emailed; individual items can be viewed in detail mode or deleted from the list. Again, the convenience of having this information in a separate window so that the user can review the list of saved items without losing the results of his or her most recent search has seemed worth the trouble of having PHP and HTML code that is not in strict compliance with the latest recommended standards.
§ 30 The design problems raised by these two small windows
suggests some of the larger design questions I have gradually come to regard as
central to the success or failure of the OEN
Bibliography site. At an early stage I realized that compliance with
web standards would create a site that was not only more flexible and more
likely to survive the inevitable changes in web practices and browser
capabilities in the years to come, but also easier to use; standards-compliant
design, as it turns out, is also user-friendly design, and offers greater
accessibility to a broader range of users. For this reason I moved quickly from
a frames-based approach to the site to a layout based purely on CSS (Cascading
Style Sheets). This greatly improved the efficiency of the PHP and HTML code; it
also made me aware, however, how differently the meaning of
compliance
is construed by the creators of different
browsers. Quite apart from the differences in font sizes and screen resolutions
between platforms, which is a serious enough matter for a site which is heavily
text-based and tries to squeeze a great deal of information onto the screen at
once, the site simply behaves differently in different web browsers. This is not
a problem unique to the OEN Bibliography site, of
course; some of the problems, and some solutions, are presented at http://www.positioniseverything.net/, and various cross-browser
compatibility issues are discussed at http://www.codestyle.org/index.shtml. In a standards-compliant
browser such as Mozilla, for example, the banner across the top of the OEN Bibliography site and the search form on the left side
remain fixed, as they are designed to do, while the search results scroll past
and underneath them. In Internet Explorer for Windows, however—still the most
commonly-used browser in the world— this is not the case: the top banner scrolls
away, and the side panel disappears as the user scrolls through the search
results. It would be a simple matter to rebuild the site with frames and avoid
this incongruity, but CSS is universally, and I think rightly, regarded as the
best way to create a webpage layout. In this case it has seemed more important
to achieve forward-compatibility through adherence to web standards than
backward-compatibility with older or stubbornly idiosyncratic browsers. But it
has not been easy to accept that the site will look quite different to different
users; with each small change in the display or style settings—a change in font
size, color, position, etc.—I have to check the appearance of the site on as
many different platforms and browsers as I can find.
Testing and Further Development
§ 31 After some disappointment with academic web servers, I placed the OEN Bibliography database on a commercial web host (http://www.ipowerweb.com/) where generous amounts of server space and bandwidth were available at a reasonable cost. This has so far been a dependable host and seems like an acceptable alternative to the free but not always reliable services and support of an academic institution (I realize here that my experience is uncommon; most universities have first-rate support and well-trained staff dedicated to creating online projects, and if these had been available I would certainly have used them). At the same time the Old English Newsletter itself has moved online in a companion site (http://www.oenewsletter.org/OEN/), which also uses PHP and plain-text files with minimal HTML markup to display current content and an archive of back issues of OEN. The site also offers a searchable database of conference abstracts, and frequently updated notices of conference and events. Both sites appear to be surviving their initial period of public testing with very few problems, and feedback from users has allowed me to make adjustments and improvements in their usability. Users of the database and the online OEN are asked to submit corrections, additions, and suggestions as they use the sites, so that their quality and accuracy will improve with use.
§ 32 Already imagined for the near-term future are improvements to subject headings and a system for streamlining the annual addition of new items, which is still done by hand from a text file of the most recent printed Bibliography. I hope that some generous users in Germany, Italy, and elsewhere will offer language localizations for the instructions, notices and field labels on forms. I would also like to give users the choice of having search results displayed and saved in MLA, APA, or University of Chicago format. It would be useful to have an easy way to access the database through desktop bibliographic software like EndNote or ProCite, but I have no idea how to create such an interface; I would be happy to collaborate with any reader who could help me do so.
§ 33 Two much-requested developments that will probably not be forthcoming in the near future are the addition of reviews or abstracts for all items, and a direct link to online journal repositories such as JSTOR and Project MUSE. While I recognize that both would be very useful features, the former would represent an enormous investment of time, and unless the database suddenly attracts a massive groundswell of volunteers, such work is not likely to happen. The latter would involve too many complex links to proprietary sites; most online journals restrict access in one way or another, and the paths by which these articles can be accessed vary too greatly from one system to the next to make any uniform set of links practicable.
§ 34 What is most important for the long-term health of the project, I think, is more involvement by more people; looking back over the past three years, I see too clearly that I should never have undertaken a project of this size and complexity alone. In order to keep the database current, to improve its accuracy and completeness, to develop its functionality in productive and interesting ways, and to deal with routine maintenance will require a team of collaborators who understand the project and are willing to share the work. The time and energy invested in this database have been equivalent to the production of a substantial monograph; my biggest lesson has been that a project of this sort, unlike writing a book, cannot really be done alone.
§ 35 The OEN Bibliography database is a tribute to the hard work and meticulous scholarship of the printed Bibliography’s authors. I hope it will be useful to scholars and students, and that others will be inspired to develop similar resources, and will learn from my experience how to do it better and more efficiently than I have. I welcome any comments, questions or suggestions; please send these to rliuzza@utk.edu.
Appendix: Useful web sites for learning MySQL and PHP
- Using Apache web server with Microsoft Windows: http://httpd.apache.org/docs/1.3/windows.html.
- Setting up Apache, PHP and MySQL on Windows: http://www.devnewz.com/2004/0303.html.
- Activating the Apache web server in Mac OS X: http://www.macdevcenter.com/pub/a/mac/2001/12/07/apache.html.
- Installation modules for MySQL and PHP on Mac OS X (from Mark Liyanage): http://www.entropy.ch/software/macosx/welcome.html.
- MySQL home page: http://www.mysql.com/ (outstanding documentation).
- PHP home page: http://www.php.net/ (also excellent documentation).
- phpMyAdmin home page: http://www.phpmyadmin.net/home_page/index.php.
- The Library of Congress
Metadata Object Description Schema
: http://www.loc.gov/standards/mods/. - Tutorial on setting up a MySQL database with PHP: http://www.freewebmasterhelp.com/tutorials/phpmysql/1.
- Build Your Own Database Driven Website Using PHP and MySQL (online book by Kevin Yank): http://www.sitepoint.com/article/php-mysql-tutorial/.
Works cited
Berkhout, Carl T., 2004. The bibliography of Old English: Back to the future, in Jonathan Wilcox, ed., Old English scholarship and bibliography: Essays in honor of Carl T. Berkhout, OEN Subsidia 32. Kalamazoo: Medieval Institute Publications. 107-19.
Greenfield, Stanley, and Fred C. Robinson, 1980. A bibliography of publications on Old English literature to the end of 1972. Toronto: University of Toronto Press.
Greenspan, Jay, and Brad Bulger, 2001. MySQL/PHP database applications. New York: M&T Books.
Krug, Steve, 2005. Don’t make me think: A common-sense approach to web usability. 2nd ed. Berkeley: Peachpit Press.
Nielsen, Jakob, 2000. Designing web usability: The practice of simplicity. Indianapolis: New Riders.
Williams, Hugh E., David Lane, and Andy Oram, eds., 2004. Web database applications with PHP and MySQL. 2nd ed. Sebastopol: O’Reilly.