Thursday 20 July 2023

The Endings Project and the Canterbury Tales Project (and also, Boccaccio and Dante)

 At last, after many years, we (ie, me and a few other people) are getting ready to unleash on the world a whole series of digital scholarly editions. We have already released the second edition of Prue Shaw's Commedia, now at www.dantecommedia.it. We are now contemplating a third edition of that. Soon to come are Bill Coleman and Edvige Agostinelli's edition of Boccaccio's Teseida. And then the really big one: the first tranches of the Critical Edition of the Canterbury Tales. Based on All Known Pre-1500 Witnesses, with myself and Barbara Bordalejo as General Editors. All of these will appear in the next twelve months.

Why so long? We (as before) have been working on all these since the 1990s (the Dante and Chaucer) and 2000s (Boccaccio). There are multiple reasons. For this post, one reason is specially important: we wanted to be sure the edition could survive the chances of online time. It should stand alone, for decades and even centuries to come, as surely as a print edition might survive upon a library shelf. How could we achieve this, given all the shifting currents of the digital world?

We were not the only people worrying about this. From 2016 a five-year SSHRC grant (Canada) funded the Endings project. This project took as its starting point a number of digital projects based at the University of Victoria which faced exactly the same issue we had: how can these projects be given the best chance of survival long into the future? In fact, I did not come across the Endings project until a long way into the making of Shaw's second Commedia edition. By this time I had already reached identical (or nearly so) conclusions as the Endings project, as follows:

1. While our development of these editions had used custom database technologies to present and edit all project data, our published editions would not use databases or any related "server-side" technology at all: no databases, no PHP, no python, nothing. That is: everything would be contained on one server with no outside dependences at all so far as our texts are concerned

2. Our presentation of the texts would rely solely on the core web technologies of HTML5, css and javascript. Nothing else.

3. Any departures from these principles for any part of our edition (for example: the use of external JavaScript libraries; the use of IIIF image viewers) would use widely-used open source tools.

These principles correspond the Endings project principles 4.1, 4.2 and 4.9. In some areas, however, our practice differs from that of the Endings project. For example, we do use the JQuery library, which in my view has now achieved core web technology status. I think the same is becoming true of the IIIF family. However, I do not think the same is true of XML technologies (nor, interestingly, do the Endings people) and we do not use XSLT, etc, as any part of our final publication model. We also use query strings, which again seem to me a core web technology, where Endings does not. Nor do we aim for "graceful failure" where css/javascript/something else does not work. It seems to me that providing all source data within the edition, permitting others to fashion new interfaces to our data, is the best way of anticipating any failure.

One might object: we are making a bet on certain core technologies now still being core technologies centuries in the future. Yes we are. But we see this bet as being in the same category as the bet scholars have made for millennia: that there will be a library or other place somewhere in the future which has a shelf for my book.

Another principle of the Endings project is that it will not use an external service to provide functionality, and specifically names Google Search as such a service. In my early preparations for the Shaw edition, I had investigated using Google Search to provide a search tool. Indeed, the second edition at www.dantecommedia.it implements searching in exactly this way. You can see from just a cursory use of Google Search in the second edition how unsatisfactory it is. Searching for "come", one of the most common words in the Commedia, gives just one result; "tanto" yields none at all. Many search results begin with advertisements, for holidays, or beer. I spent many hours trying to get Google Search to do better, including feeding it hard-wired urls to every page of transcription. Nothing seemed to work. It appears the Google algorithms rebel when faced with nine near-identical texts, and fail over and over to return anything like meaningful results. 

For these reasons I was contemplating just how a stand-alone search system might be implemented, when I came across the Endings project, and StaticSearch. They had done it! and it worked! On another page, I describe my experiences of StaticSearch.


No comments:

Post a Comment