Gallimaufry of Whits, by Sean B. Palmer

TextWrangler had various open unsaved notes, so I saved them all in a big file. Going through them proved to be a good summary of things I've been working on for a fortnight or so. For example, I set a puzzle to decipher the following:

def process(a, b): 
   table = {}
   table[a] = b
   while a > 1: 
      a /= 2
      b *= 2
      table[a] = b
   return sum(b for (a, b) in table.items() if a % 2)

And I found out, even though I don't use Twitter, that my Twitter ID is very small but not the smallest amongst my friends. I also did some major license research and found that moral rights are protected by default in Europe and supposedly in all Berne Convention signatory countries, meaning that licenses are perhaps redundant for my requirements.

For some reason I also did a small survey of URI shorteners, and found that u.nu and tweak.tk are probably the best, though neither support data URIs. I also made a URI shortener for Flickr, or rather an interface to their own shortener service. I helped out on a script that at one point used wget -pHEKk, and removed MacPorts from my system as part of a local installation experiment.

Running my essay on Pretty Girls through a gender analysis script made it come up as being written by a female, which is a good demonstration of naïve computational linguistics failing like crap.

Also I downloaded all of wikipedia and made a word frequency table from it. Or, rather, Adam Wendt and Terje Bless made frequency tables from it.

Someone asks you whether it's going to rain in the next hour in your opinion, and you look outside the window and say “probably not”. They ask you what the chance is. You say “twenty per cent?”. What kind of things might you base such an estimate on?

Obviously the amount and density of cloud cover, and your experience with that, will play a big part. Subtler things such as maybe having seen a barometer somewhere earlier in the day may also play a part. There may even be bias from other sources, such as just having seen a programme on the television about thunder storms, or having shopped for a new umbrella.

But there's one big obvious bias that I missed initially when thinking about this: the fact that humans commonly have ten fingers. This is a major reason why we use base ten (decimal) instead of say base twelve (duodecimal) or base eight (octal). So when you pick 20%, in a sense that reflects as much the fact that you have ten fingers as what you saw out of the window.

Of course, it's hard to say how probability syntaxes would have developed anyway had we not used decimal. In duodecimal, for example, “100” means decimal 120, and so 20% in decimal might still be represented as “20%” in duodecimal. In octal, however, the closest that you can get to one-in-five is “15%”, which is 20.3125% in decimal.

This isn't to say, of course, that we'd be likely to use a percent system in another base anyway: for a start, 100 in octal isn't a cent in the our decimal etymological sense of cent, so we wouldn't call it that unless cent meant 8². But there's no particular reason why we should use the square of the base as our top range anyway. Why not just use the base? Why don't we have a perdeca system even in decimal?

OS X 10.5 Leopard requires certain directories — such as Desktop, Library, and Sites — in the user's home directory. This is annoying if you want to use your own scheme, because programs tend to populate these with their own data. For example, in Documents I have things like “FontAgent Pro Fonts” and “uDigWorkspace” that I'm pretty sure I didn't add myself, and various programs love to make a mess of the Music directory. So there are three major organisational choices one can make here:

Just use the scheme that OS X provides, and ignore the mess.
Create your own directory names, such as tunes or media instead of Music, and use those.
Create a directory somewhere else as a pristine organisational space.

Currently I'm using a mixture of (1) and (2), but I've been thinking about switching to (3). The main problem here, of course, is deciding what such a directory should be called. I've been thinking about /sbp, /home/sbp, ~/my, ~/home, and ~/files. None of these are optimal because they all require more typing than just ~/ as would obviously be the best option after just a plain /. That's some of my annoyance, really: it seems strange that the OS should steal both / and ~/.

One nice logical option for the user space would be ~~/, but even though zsh supports named directories, it appears that it doesn't support ~ as a name:

$ eval "~=/tmp" && : ~~  
zsh: no such file or directory: ~=/tmp

And anyway, you don't want to rely on shortcuts too much in case you're having to use sh or bash. I tend, for example, to use bash for scripts even though I use zsh as my interactive shell.

Since most major browsers now support @font-face, I've been trying to select a good font to replace the Georgia that I commonly use. The main criteria are that it has to be well balanced, easy to read, and not require some specific font-size and line-height to get the most out of it.

Obviously it also needs to be licensed for use on the web. Some people are starting to ignore the licensing terms of fonts from well-established commercial foundries, and just use them on the web anyway. Others call to boycott the foundries altogether, and instead use a font with a sensible license.

Since foundry fonts tend to have better hinting and so on, I experimented first with ITC Galliard, Joanna MT Std, and Adobe Garamond Pro. Of these, though Joanna is very beautiful, Galliard performed the best. But looking through my sample sheet of open fonts, I found that there was a font which gave a similar impact to Galliard called Day Roman.

Day Roman is Pedro Reina's reproduction of Two Line Double Pica Roman, a font by the 16th century French typographer François Guyot. On OS X, with proper anti-aliasing and so forth, it looks brilliant. The “@” is a little strange in that it uses a double storey “a”, but at least that preserves the authenticity of the original. It's interesting that a font from the 16th century looks so well placed in modern design.

“C'est une nation, diroy-je à Platon, en laquelle il n'y a aucune espece de trafique; nulle cognoissance de lettres; nulle science de nombres; nul nom de magistrat, ny de superiorité politique; nul usage de service, de richesse, ou de pauvreté; nuls contrats; nulles successions; nuls partages; nulles occupations, qu'oysives; nul respect de parenté, que commun; nuls vestemens; nulle agriculture; nul metal; nul usage de vin ou de bled. Les paroles mesmes, qui signifient la mensonge, la trahison, la dissimulation, l'avarice, l'envie, la detraction, le pardon, inouyes. Combien trouveroit il la republique qu'il a imaginée, esloignée de cette perfection?”

— Montaigne

“It is a nation, would I answer Plato, that hath no kinde of traffike, no knowledge of Letters, no intelligence of numbers, no name of magistrate, nor of politike superioritie; no use of service, of riches or of povertie; no contracts, no successions, no partitions, no occupation but idle; no respect of kindred, but common, no apparell but naturall, no manuring of lands, no use of wine, corne, or mettle. The very words that import lying, falshood, treason, dissimulations, covetousnes, envie, detraction, and pardon, were never heard of amongst them. How dissonant would hee finde his imaginarie common-wealth from this perfection?”

— Florio

Gon. I'th'Commonwealth I vvould (by contraries)
Execute all things: For no kinde of Trafficke
Would I admit: No name of Magistrate:
Letters should not be knowne: Riches, pouerty,
And vse of seruice, none: Contract, Succession,
Borne, bound of Land, Tilth, Vineyard none:
No vse of Mettall, Corne, or Wine, or Oyle:
No occupation, all men idle, all:
And Women too, but innocent and pure:
No Soueraignty.

— Shakespeare

Somebody asked about Iris not having links to individual posts. This feature, or rather the lack of a feature, is actually by design.

A while ago I was updating my homepage to link to some of the less ridiculous things that I've published on the web. Looking back at these old pages, I found that though their content was still good, a lot of the appurtenances of design had faded in value. So things like site names, navigation, and styles were increasingly coming to conflict with the pages' content.

I thought about what kind of things one could do to avoid this problem. There were five main design patterns that I came up with, and I was planning to write about these in an essay series called Spartan Hypertext:

Create Neutral Settings
Suit Titles to Content
Design Simply and Distinctly
Give Overviews Early
Work on Substantial Pages

I did write some material about them, but design at this kind of level is a bit woolly. In a way it might be better to write about it prototypically, linking to pages that exemplify what I mean. The two design principles or rules of thumb that I noted for Whits, keeping it simple and allowing rebuilding in place, are obviously represented in the list above as “Simply and Distinctly” and “Neutral Settings” respectively.

There ought to be a Descriptivist HTML site which notes every construct implemented in every browser, or mentioned in every specification. The constructs should be prioritised along two dimensions: by the current support, in terms of browsers and other consumption; and by usage, in terms of HTML documents in the wild and other production.

Henri Sivonen's Doctypes and Browser Modes page is an example of what I mean. But imagine if you wanted to look up which browsers, for example, barf on an unencoded ampersand. A site containing such minute details would of course be difficult to maintain, but very useful.

Jacob Kaplan-Moss wrote an essay about this, and got some very confused comments on it. Someone suggested that a notorious standards effort called HTML 5 was descriptivist in this sense. Much good descriptivist work has indeed come out of that effort which may be collectible, but their product is decidedly not descriptivist, and nor should it be.

Two important rules of thumb for Ambient Information:

Usefulness has to be coded somewhere
Established patterns for URIs are important

The point about usefulness means that if you have a meta-format like SGML, XML, RDF, YAML, or JSON, the extent to which their common tools help you out is minimal. When you think about XML especially, for example, you realise that often you can make much smaller parsers and easier to read formats if you just come up with your own syntax.

The point about URIs is that when you use them for something like XML namespaces or the ridiculous conceptual stuff in RDF, then people start to ask what's going on. The prototypical use of an HTTP URI is to put it in a browser, and a webpage comes up. If you start using a URI peculiarly, people are going to be confused. Therefore, use URIs in such a way that this peculiarity doesn't occur.

Before the Gallimaufry of Whits, I had dozens of essay sites. Some of them lasted a year or two, and some of them only had a few articles on them. So when I created yet another site, I wasn't expecting it to be anything different.

Now that Whits has surpassed all of its predecessors in this sense, however, it's clear what the problem was. It wasn't that I was unhappy with writing or writing themes, it's that I was unhappy with structures and styles.

In other words, I change my mind all the time on trivial things about weblogs: where the archives should go, whether posts should be separate or on a single page, whether they should have titles or just automatic dates. Styles and structures on this level seem a nuisance, but the underlying things actually written on the site are mainly orthogonal to this.

So Whits has been successful because it purposely concentrated on two design guidelines. The first was that the site should be as simple as possible to publish to. Sometimes I've crammed in a few more features that I didn't really need, but in general I've tried to make it so that you just enter text and publish it. That bit is tricky, but it's not really an insight on the same order as the second guideline: that the site structure should be mutable in place.

This is the fundamental thing which has meant that this site can retain some continuity, rather than just making yet another scrappy project. When I get annoyed with all of the structural and stylistic aspects, I just rewrite the code, but I do it in place. So far there have been six versions of the Whits code, and I've just written the seventh version.

This seventh version, Iris, really concentrates on the two guidelines perhaps more than any previous version. For a start, it's by far the shortest and simplest of all the codebases so far: just one 39 line python file, feed.py. Secondly, it aims to have as light a footprint as possible, as pioneered by the sixth version Hebe. There are no difficult dependencies to maintain, and it should be trivial to update the code to python3.

There used to be a few sites using the initial Whits code, but then I think they started to add features. The new code is even more spartan, so it would be interesting if anybody else were able to keep to an unmodified Iris installation.