Sunday, 17 September 2023

Setting up the revised Collation Editor: file structures

 In an earlier post, I explained some of the history behind the Collation Editor, and our use of it in Textual Communities.  At last, I am updating the Collation Editor embedded into TC!

The Collation Editor has two major dependencies:

  1. On Python, for a series of critical tasks run through a Python server;
  2. On CollateX, for the actual collation.

The first task was to create a version of the Collation Editor Core implementing both dependencies. I did this by mirroring the structure of the stand-alone collation editor code (available at https://github.com/itsee-birmingham/standalone_collation_editor). Thus, this is what the top-level folder looks like in my implementation (in my installation, in /Applications/Collation_Editor_Core):

That is: at the root level I have a folder holding collateX, with the collatex-tools jar in it. There is a folder labelled "collation" which we will look at in a moment. There are two python files, and then a .sh and .bat file which start up the application (this structure is taken from the current stand-alone collation editor structure).

Within the "collation" folder, here is what I have:

And then, going still deeper, this is the content of the "core" folder:
You see here a series of .py files, all needed for the link to Python to work. However, we need to have an index.html file in place to run the instance. The index.html file is actually contained within the "collation/static" folder, as follows:
Here is what the index.html file has, in this starter configuration:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  <meta http-equiv="X-UA-Compatible" content="IE=8" />
  <title>Collation Editor</title>
  <meta name="description" content="Collation and Apparatus Editor" />
  <meta http-equiv="X-UA-Compatible" content="IE=edge" />
  <script>
    var SITE_DOMAIN = "http://localhost:8080";
    var staticUrl = SITE_DOMAIN + '/collation/';
  </script>
  <script type="text/javascript" src="/collation/js/jquery-3.3.1.min.js"></script>
  <script type="text/javascript" src="/collation/js/jquery-ui.min.js"></script>
<link rel=stylesheet href="/collation/pure-release-1.0.0/pure-min.css"  
type="text/css"/>
  <script type="text/javascript" src="/collation/CE_core/js/collation_editor.js"></script>
  <script type="text/javascript">
    var servicesFile = 'js/local_services.js';
    collation_editor.init();
  </script>
</head>
<body oncontextmenu="return false;">
<div id="header" class="collation_header">
<h1 id="stage_id">Collation</h1>
<h1 id="project_name"></h1>
<div id="login_status"></div>
</div>
<div id="container">
<p>Loading, Please wait.</p>
<br/>
<br/>
</div>
  <div id="footer"></div>
<div id="tool_tip" class="tooltip"></div>
</body>
</html>

Note that the "src" and "href" attributes direct to "/collation..." not to "collation..". The preceding "/" is important as this sends the server to look for these files in the root "collation" folder.

With this structure we can start up an instance of the Collation Editor with both Python and CollateX in place by going to the terminal, moving into the root directory thus:
cd /Applications/Collation_Editor_Core

And then starting up the instance with

./startup.sh

This calls Python 3 to start a server at localhost:8080, with the "collation" folder as the root, and running the Python .py files in the "collation/core" folder. It also starts up CollateX, from the "collatex" folder at the root, with CollateX running on another. If all is in place, this is what you will see when you go to "http://localhost:8080/collation/" in your browser.

If you have the "data" folder from the stand-alone installation in the "collation" folder, you can type in "B04K6V23" into the "Select" box and then hit the "Collate Project Witnesses" button (currently not working ...)

 

Setting up the revised Collation editor: some history (2023)

 I am a huge fan of the "Collation Editor", built by Cat Smith of the Institute for Textual Scholarship and Electronic Editing (ITSEE) at the University of Birmingham, with substantial input from Troy Griffitts, now at The Göttingen Academy of Sciences and Humanities in Lower Saxony. Some history is required. The roots of the Collation Editor lie in my Collate software, written for the Macintosh computer from 1989 on and, in its day, used heavily by multiple editing projects. Notable among these user projects were two groups editing Biblical texts: those associated with the Institute for New Testament research at Münster, Germany (INTF), and David Parker and scholars working with him at the University of Birmingham (now, ITSEE). 

Part of the story of how Collate begat CollateX, and CollateX begat the Collation Editor, is told in other blogs on this site: https://scholarlydigitaleditions.blogspot.com/2014/09/the-history-of-collate.html and https://scholarlydigitaleditions.blogspot.com/2014/09/collate-2-and-design-for-its-successor.html. These blogs, though here dated 2014, were written in 2007. Other parts can be deduced from an article about the evolution of digital methods in the INTF and ITSEE written by myself, David Parker, Hugh Houghton and Klaus Wachtel (you can read that article at my Academia site, or via its DOI). 

The first part of this begetting is the making of CollateX. CollateX fulfilled completely the first part of the agenda I laid out in the blogs on this site: to create a system for comparison of multiple texts which was modular and independent of any one hardware or software implementation. CollateX is a marvel, and a remarkable achievement by the team of software engineers who made it (prominently, Ronald Dekker of the Huygens Institute, Amsterdam). 

The second part of this begetting was the making of the Collation Editor. This creates an entire environment permitting editors to create exactly the collation they want, by determining through a point-and-click interface exactly what words collate with what and how the collation is to be expressed. Essentially, the Collation Editor is an interface to, and an extension of, CollateX: permitting editors to adjust the CollateX collations to create exactly the collations they want. For me, the test of the Collation Editor, and its implementation of CollateX, was simple: could we achieve exactly the same complex collations with the Collation Editor/CollateX as we could, from 1995 to around 2015, with Collate? The answer is, triumphantly, yes. Indeed, we could achieve far more with the Collation Editor than we ever could with Collate. Here is the tool I dreamed of in 2007. (Somewhere, I said that it would take a team of ten people ten years to make the replacement for Collate. I was not far wrong).

Accordingly, in 2016 I started work on integrating the Collation Editor into Textual Communities. We have now used this integrated implementation to collate some four thousand lines of the Canterbury Tales, in preparation of our forthcoming Critical Edition of the Tales. You can see how this works in a video I made, collating just one line of the Tales. As you can see, the Collation Editor can create exactly the highly-complex collations we want. In the last years, it has become an absolutely vital part of our work on the Tales. However, the version we integrated in 2016, and which is still the version we are using, is now seriously outdated. Many improvements have been made to the Collation Editor since 2016 (or, in effect, 2019, when we last updated our implementation of the Collation Editor) and finally, thanks to a sabbatical, I am setting out to bring the Textual Communities version of the Collation Editor up to date. This task should be greatly eased by the re-organization and rewriting of the Collation Editor since 2019. The Collation Editor code has now been cleanly divided into a "core" code library, designed so that the whole core can fit inside any implementation and be easily updated, and a "services" code library, which connects the core to whatever implementation you want. In our case, we use MongoDB document databases to store all our information about our texts, and hence everything the Collation Editor needs to function should be linked to our MongoDB databases.

In the next posts, I will explain how I went about setting up the updated core collation tools of the Colllation Editor to work within Textual Communities, in the same way as a series of blogs on StaticSearch explain how I got this to work with our data.