When I designed HTML for the Web, I chose to avoid giving it more power than it absolutely needed - a 'principle of least power', which I have stuck to ever since. I could have used a language like Donald Knuth's 'TeX', which though it looks like a markup language is in fact a programming language. [...] It would allow you to express absolutely anything on the page, but would also have allowed Web pages that could crash, or loop forever. This is the tension.
Tim Berners-Lee (1999), Weaving the Web, p.197
Some say this isn't a bad thing. Alan Kay said in 1997 that the principle of least power was a thing of the dark ages:
I was in the Air Force in 1961, and I saw it in 1961, and it probably goes back one year before. Back then, they really didn't have operating systems. Air training command had to send tapes of many kinds of records around from Air Force base to Air Force base. There was a question on how can you deal with all of these things that used to be card images, because tape had come in, [there] were starting to be more and more complicated formats, and somebody—almost certainly an enlisted man, because officers didn't program back then—came up with the following idea.
This person said, on the third part of the record on this tape we'll put all of the records of this particular type. On the second part—the middle part—we'll put all of the procedures that know how to deal with the formats on this third part of the tape. In the first part we'll put pointers into the procedures, and in fact, let's make the first ten or so pointers standard, like reading and writing fields, and trying to print; let's have a standard vocabulary for the first ten of these, and then we can have idiosyncratic ones later on. All you had to do [to] read a tape back in 1961, was to read the front part of a record—one of these big records—into core storage, and start jumping indirect through the pointers, and the procedures were there.
I really would like you to contrast that with what you have to do with HTML on the Internet. Think about it. HTML on the Internet has gone back to the dark ages because it presupposes that there should be a browser that should understand its formats. This has to be one of the worst ideas since MS-DOS. [Laughter] This is really a shame. It's maybe what happens when physicists decide to play with computers, I'm not sure. [Laughter]
Alan Kay (1997), The Computer Revolution Hasn't Happened Yet
This leads to the natural theory that all computer formats will tend to become Turing complete. Perhaps this is only true as a corollary of Zawinski's Law. But the stages of development are clear:
Congratulations, your format is now Turing complete.
If a format was designed with forward extensibility, with a standard vocabulary as mentioned by Kay, then stage three, shoehorning the tools into the format, can be done with ease. Dan Connolly used to trumpet this forward extensibility rule at frequent intervals. That it had already been invented in the US Air Force circa 1960 indicates that it's a natural concept.
Imagine, for example, that you chose the magic string
"Link: " to indicate a link that should be automatically loaded when a text file is browsed. This would be a very poor choice, given that
Link: is also the name of an HTTP header, and there exist plain text dumps of HTTP headers which contain this header.
Even if you used a UUID or another URI for the magic string, what happens when somebody wants to write about the mechanism itself, without invoking the mechanism, in a text file? There would have to be some escape mechanism, a concept invented by Bob Bemer, for this to be possible.
On the web, the most reasonable approach would be not to use a magic string, but to use a new media type, such as
text/scriptable-plain, to authoritatively indicate the new type. In other words, the forward extensibility mechanism of the web is the media type. The magic string is a badly engineered alternative. In individual formats, the forward extensibility mechanism can vary. HTML had an
@profile attribute on the
head element, for example, which was eventually beaten by the
script element; whereas JSON has no means of forward extensibility.
The conclusion is that formats which have no means of forward extensibility are doomed to admit non-authoritative scraping facilitators, such as magic strings and hot comments, in order to evolve.
The problem with Turing complete formats is not, as Tim imagined, that they can crash or loop forever. Pages are hosted in software that encapsulates their state and can recover if certain resource limits are exceeded. When formats can compute, they must also be secure. The problem is that, as Alan Kay says, we don't know how to compute. We show no restraint. We make interfaces which are terrible:
I have discovered that there are two types of command interfaces in the world of computing: good interfaces and user interfaces.
Daniel J. Bernstein, The qmail security guarantee
And we, as a society, presently reduce our content to the size of a postage stamp amidst a cacophany of social media widgets, trackbacks, advertisements, navigation elements, and all manner of frightful and tortuous delenda. Many of these did not even require computation to become useless.
The tendency of formats towards Turing completeness is a kind of tragedy of the commons. The vast pastures of computation are our commons. Because we don't know how to compute, then we also don't know how to make the most out of our formats, because our formats naturally want to be computers.
Computers do not express themselves through monitors alone. They can play sounds, connect to networks, and save files to persistent storage. Some computers control motor engines, or regulate heating controls. Others are being used to water plants or fly unmanned helicopters.
Computers are not only defined by what kinds of computation they can do, which is a level playing field thanks to Turing completeness (though don't try emulating GTA: V in Befunge on a PDP-8), but also what devices they are connected to.
This has important ramifications for Turing complete formats, because the current trend is to segregate them from devices as much as possible, in the name of security. Gated access is slowly becoming available. Web pages can ask for geolocation information of the user, though they may not be granted it. But they can't ask for access to an arbitary device, so you still can't ostensibly use CoffeeScript to power your visitor's irrigation system.
Of course, forward extensibility hooks can be used to enable device access. You can create a new HTTP header that gives data about how to access your device to the web page that you're accessing. Security would be difficult, but solveable. It's difficult for anything to get in the way, as long as you have a general purpose computer.
browser.js file to do the actual rendering work. (I've been unable to locate this post; please tell me if you find it!) This is just like what Alan Kay was describing in his story.
If we keep making new operating systems in old operating systems, where will it end? Will it one day be easier to port old emulators, like JSMESS, than to port old software? When the playing field is level, the game will be long.
I'm writing these articles in Markdown in emacs. A simple text format, in a portable textual operating system. Yet if I wanted to embed a script element, I could do so. You could watch a 6502 chip get emulated, or play a Minecraft derivative. You can't limit the general purpose computer. But power entails responsibility, and taming the computer is no easy task.