HTML as She is Spoke

HTML as She is Spoke is an idea for a descriptivist guide to HTML. Such a guide would let one know, for example, how many browsers support “--” in comments, or how many choke on raw ampersands in text. Since this is a descriptivist idea, HTML can be taken to mean either HTML, CSS, or Javascript, so constructs aren't limited to markup alone. The guide should also keep track of prescriptivist developments, since popular prescriptivisms often become descriptivisms. The idea would be for it to become an authoritative source for making HTML.

All current HTML specifications, including HTML5, are prescriptivist. Some of the prescribed constructs may be based on descriptivist observations, but the entire purpose of a specification is to prescribe a document type to which there is boolean conformance: you either conform or you do not. With descriptivism, there are shades of similarity to current practise, but no document can ever be said to be conforming or not conforming since there is no standard against which it can conform.

As an example of the difference between being based on descriptivist practices, and actually being descriptivist, consider HTML's relationship to SGML. In HTML5, one of the great leaps forward is to say that HTML5 is not based on SGML. This was bourne from a survey of existing HTML consumers and seeing how many of them use SGML constructs. In Spoke, however, the idea would be to simply document the ratio and the various interesting uses of SGML and non-SGML parsing of HTML.

There could be two main users of such a project. One may think of it as primarily author facing, which can itself be broken down into two camps. The first are web developers, the kind of people who write HTML all day, either by hand or using templating systems, or any HTML authoring agent which gives them some level of control over their markup. The second author camp consists of the authors of HTML authoring agents themselves, which are a much smaller camp but get to dictate the kind of HTML that anybody who uses their applications will create, so they're an important bunch. Secondarily, one may also think of it as being oriented towards HTML UAs, for people who write the programs that consumers of HTML use, usually browsers.

Of course, a purely descriptivist work about HTML could become very large and complex. This does not, however, mean that you can't huffmanise it. If you think of all HTML constructs as equally weighted, then they would certainly be difficult to navigate, but in fact a decent descriptivist documentation would order things by frequency of use. So for example, isindex in HTML 2 is not going to be as prominently written about as, say, the p element. Such a work should enable people to break the standards hegemony and produce sensible HTML.

Who would write such a thing? Spoke may be written as a Wikibooks project. Wikibooks is a Wikimedia Foundation site which hosts books that people have written about mainly academic subjects. The social development of the site would then be as unlimited, potentially, as Wikipedia. Moreover there are various nice aspects about the idea itself that reinforce the social possibilities. People argue in standards development, for example, because they argue about which direction ought to be taken. They argue about which prescriptivism is better than another. With descriptivism, however, it's much more difficult to have such arguments.

Data for such a project should be moderately easy to find. There are various compatibility tables and so forth for the various existing browsers, and large HTML corpora are available from sources such as the Internet Archive. Supercomputing services such as EC2 may also be helpful. Statistical analysis techniques could be researched and honed. Overall, such a project could be quite organic and grow as HTML grows, helping authors to keep on top of the many issues in web design at the format level.


by Sean B. Palmer, 6th February 2010