Tuesday, 20 July 2021

Fun with Fonts. Junicode, Unicode, and ꝑ

 If you see a character looking like a p with a bar through the descender in the title of this post, and you see it here too , then ... read on. And if you don't, then read on (and let me know!)

Thirty years ago, when myself, Tim Berners-Lee, Lou Burnard and the web were much younger, every "special character" was a challenge, and a potential triumph or failure. "Special" meant something beyond ASCI 127 (ah, the acronyms!). It meant anything non-English, in the most limited BREXIT sense. E-acute was used by people from across the Channel, and a few Canadians, and not to be used without Special Equipment (in those days, a Macintosh computer). Devanagari was a distant dream, and right-to-left writing, an impossibility.

Nowadays, thanks to Unicode, and the work of many unsung heroes of font-design, with a special shout-out to those who sat on myriad committees and shepherded the whole process to every smart phone on the planet, we have become so used to everything appearing just right, with no effort at all on our part, that we are in danger of forgetting how many miracles had to occur so that I can insert a in my document, and you can see it. (The best miracles are made by people working together, of course). But every now and then, something happens to remind us of how many ducks make a row.

Like many medievalists, I am a fan of Peter Baker's beautiful Junicode font. For years, I have been happily typing into transcriptions, Word and pdf documents. This and a few other characters are very common in many medieval vernacular and Latin manuscripts.  is used as an abbreviation for per or par, as in "person" and "parish", and so found everywhere in Chaucer manuscripts (think of the Parson and the Pardoner). One of the great joys of Junicode is that it shows this character in a particularly elegant form, appearing as 


Over the years, we have used Junicode in all our work with medieval texts, and have become so accustomed to the daily miracle of Junicode that we don't think about it. It works. "We" is all the people who work on the Canterbury Tales Project and a few other projects -- particularly Dante. I am currently working with various Dante scholars on a new publication, coming soon to a browser near you. Trust me, you will know about this when it happens. So, imagine my surprise when after so many years of trouble-free use, my main collaborator said that our elegant Junicode p with bar appeared as a horrid oversize black character on her computer, thus:

At first, I thought this was just an aberration, something odd about the way her computer was set up. The character appeared fine on my computer, and on various other computers I looked at, but not on hers. Why not? Down the rabbithole I went.

By this time, we had graduated to bundling the Junicode font with our developing site, so that readers would not have to download the font to their computer. This a well-documented process, and Squirrel font documents it and provides neat tools to convert any font to a "webfont", easily embeddable in any web page. So I began investigating. On my computer, the character appeared fine:

  • if I had Junicode on my computer, and the font embedded in the page
  • if I had Junicode on my computer, and the font NOT embedded in the page
It did NOT appear fine if I did NOT have Junicode on my computer and had only Junicode embedded in the web page. Yet the web page showed Junicode everywhere else -- but not this character, and a few other characters. How could this be? 

I began digging. The unicode code point for p with a bar is A751. This is in the "general use" area of unicode, which major fonts will support as a matter of course: so you can paste the ꝑ from this document into a Word document and use it in Times New Roman, Geneva, etc. When I looked at Junicode in my computer, using Apple's Font Book, p with a bar appeared as glyph 2007, Unicode A751, exactly as it should:

However, on my collaborator's computer, the same character appeared in a quite different place: as glyph 2066, unicode E670 (on my computer, Junicode has a quite different character at glyph 2007). 

What is going on? Why is her Junicode different from mine? On digging about, it appears that some time in the past, Junicode indeed had this character at E670. The "E" and "F" unicode ranges are "Private Use" areas, and it appears that up to the time when p with a bar was allocated A751 in the "general use" area, Junicode put p with a bar in the "private use" area, with that encoding. This is a rather long story, involving a group called the Medieval Unicode Font Initiative (MUFI). One of the aims of this group was to have "core" characters judged as essential to scholars working with medieval western European texts incorporated into the "official" Unicode encoding. As of Unicode 5.1, 152 MUFI characters -- among them, p with a bar -- had made it into official unicode. It appears that my version of Junicode reflects this shift of p with a bar into official, post 5.1, unicode. The version of Junicode on Prue's computer did not.

More digging. By this time, I was suspecting that the embeddable version of Junicode did not have p with a bar at A751. But why did it display correctly on my computer? It appears that somewhere deep in the innards was an instruction to the effect: if the browser could not find the character in the embedded font, look elsewhere: so it looked in the Junicode on my computer, found it and displayed it. It did this even when I tried to fool it by calling the embedded font something else in the CSS ("junicoderegular") style sheet. However, on my collaborator's computer the character did not appear as A751, and so it showed an A751 from another font altogether.

Eventually, after scores of emails and hours of digging, I concluded that the root of the problem lay in the embedded font. Somehow, this embedded Junicode did not have p bar where it should be. So I set to trying to correct this. First I went to the Squirrel font generator:

I uploaded the Junicode TTF from my computer, Squirrel converted it to a "webfont", and all seemed fine. Nope. Same problem. I dug deeper. I went to Peter Baker's "Junicode" page on FontSquirrel and used the "webfont kit" generator on that page. Nope. Same problem. With increasing desperation, I noticed that the page offered a choice of "subsets":

So, I chose "no subsetting" and created the webfont. And at last! it worked!

All this for characters which appear just five times in some 2400 pages of manuscript transcription. 

This tale casts into relief the many rough edges that exist in the interplay of fonts, glyphs, character coding points, unicode spaces, and encoding systems (utf8? or 16? BOM or not?), all playing against multiple versions as all of these evolve and agreements are forged and renewed. The wonder is that problems like these occur so rarely.