Sunday, 30 July 2023

Setting up staticSearch for our projects: nested files and multiple search entry points

 I now realize (a week later!), after looking at the staticSearch projects listed in the documentation two things I did not know before,  two things where our projects differ (it seems) from all other StaticSearch implementations to date:

  1. staticSearch assumes (or at least, all the listed projects appear to follow this model) that all the pages to be searched are held in the same folder as the root index.html folder. Indeed, a 2019 presentation by the staticSearch team explictly declares that "All pages live together in the same folder" and, furthermore, "We don't care" if that means there are 10,591 files in that one folder.
  2. staticSearch assumes (or at least, all the listed projects appear to follow this model) that all searches are launched from a single place, and a single file, contained in that same folder holding all the project files.
Neither of these assumptions hold good for our projects. I anticipate that the Canterbury Tales Project when complete (!) will require somewhere around 90,000 distinct html files: one for each of the 29,000 manuscript pages in which the Tales occur; three files for each of the some some 20,000 entities (lines of poetry, blocks of prose) which constitute the text of the Tales. I, for one, am not comfortable with around 90,000 files in a single folder. We devised a uniform directory structure to hold all these files. The transcript of folio 1r in Hengwrt is held in "html/transcripts/Hg/1r.html"; the collation of the first line of the General Prologue is held in "html/collations/GP/1.html". By design, then, all our html files are buried four layers below the "home" folder holding our index.html file.

In fact, we discovered that the '<recurse>true</recurse>' statement in the configuration files means that staticSearch has no problem at all with nested directories. It duly finds and indexes all our html pages. But the second issue -- that the default staticSearch configuration expects that all searches will be run from a single file, located in the project home directory -- does cause problems. We could, quite easily, have set up our projects the same way as staticSearch expects, so that clicking on a "search" icon or similar on each of the 90,000 pages would send the reader to a single search page, presumably in the home directory. But we did not want to do that. Here is how the header for one of our project pages looks (for folio 72r of the Naples manuscript of the forthcoming Agostinelli/Coleman edition of Boccaccio's Teseida looks:

A fundamental principle of our edition design is "have only the pages you really need". We want our readers to be able to run the search directly from the page they are looking at, and not have to go to any other page to do the search. Further, we want the header on all our pages to look the same, following another mantra: "keep everything as uniform as possible across the whole edition". This meant that every one of our (possibly) 90,000 html pages would have a search box on it, as you can see in the top right of this image. This means too that searches would not always begin from a file located at the root of project folder. Indeed, all searches except those run from the index.html starting point to our editions would begin from a file nested four layers deep in the project folder. And that is why we found the problems with folder paths referred to in the previous post.

I will post a suggestion in the staticSearch issues forum as to how staticSearch itself could help projects configured like ours, with many files spread over multiple folders and each file being a search access point. In a final post in this series, I offer some general thoughts about staticSearch.

No comments:

Post a Comment