Gallimaufry of Whits
Being for the Month of 2008-02
These are quick notes taken by Sean B. Palmer on the Semantic Web, Python
and Javascript programming, history and antiquarianism, linguistics and
conlanging, typography, and other related matters. To receive these bits of
dreck regularly, subscribe to the feed. To browse other
months, check the contents.
My slightly provocative piece yesterday, who uses OWL for the DL?,
got more feedback than I expected, so I ought to explain a bit more about where
I'm coming from.
You can see my thoughts yesterday, which started off as me just rambling
about whatever and then publishing it to go for the feedback-gusto, in a
clearer light if you read Pat Hayes's Catching
the Dreams article from several years back. Perhaps his opinion's changed
now, but back then he stated that "the overhead required by DLs, particularly
the conceptual overhead, is now a barrier and an impediment to progress". What
I'm saying certainly isn't new, and isn't uninformed.
What isn't true is my implication that nobody uses OWL tools for
the DL. Of course there is a large industrial, commercial, enterprise,
academic, intranet style usage of OWL and DL tools. But will these fairly
private things trickle over into the public side? It's nice if our government
records are stored in a database that's OWL/DL-powered, but I also want to be
able to do more mundane things like check cinema times against train times. I
can't do that mashup if the cinema and train companies won't release their data
for me to remix.
TimBL said
the other day that he feels "we may be quite close getting this to gel to the
point where it becomes just an essential part of life". My provocative thought
is: what part does OWL/DL have to play in that gelling process, of making the
Semantic Web essential? Does it count if bits and bytes that only peripherally
affect me are the only things OWL/DL has a part in? Surely not.
I don't mean to turn this into a diatribe against OWL. I'm not saying that
OWL is a bad technology (it's very well engineered), and I'm not saying that
nobody should ever use it (clearly companies are finding it useful). I aim to
question whether it's useful for the more grassroots efforts, the FOAFs and the
SIOCs and the things as yet uninvented, as those are the things that have the
most direct impact on people; the things that will make the Semantic Web a
visibly "essential part of life".
Moreover, as a Semantic Web developer, and as a Semantic Web developer who's
been working on the whole shaboodle for a particularly long time, I have a kind
of vote to cast in the form of which technologies I investigate and work on,
and that's a very personal choice to have to make. What I'm saying is that at
the moment, I'm thinking that OWL has a very small part to play in My Semantic
Web.
If you're looking to use GRDDL in HTML 5 then you might
be disappointed: there's no @profile. I asked
about this on #swig yesterday, and DanC said "I think the reason head/@profile
isn't in html5 is that it hasn't really crossed hixie's radar; i.e. it's not
used in a statistically significant portion of web content". Kjetil told me
today that it's been dropped because it's not cut-'n'-paste compatible, which
we had a little debate about.
I had suggested that the transformation link type could be added to HTML 5,
which would make @profile redundant, but of course that would break old tools.
One approach which wouldn't break old tools would be to make a profile link
type, such as the one that I proposed
on the WHATWG Wiki today, and then make the XHTML namespace be a GRDDL document
which bootstraps in the new profile link type.
As nsh says, "when you have discussions like this, the aliens laugh at
us".
Something I've been pondering: how are we to go about creating new myths in
a period when science tends to drown myth out? Tolkien failed to have his
legendarium thought of as a myth because it got seen instead as one of the
archetypes of a new genre, fantasy. If nobody had created a work subsequently
that was imitative of it, perhaps it would have been classed as just a very
recent version of a myth. But myths do tend to have that quality of being
anonymous, and Tolkien's legendarium was really his own.
We do still have relatively new interesting folklores such as urban legends,
ufologists, and things like that. We don't, however, have myths as they're
generally understood, where the once upon a time doesn't matter. It's like with
a true myth, you don't think of it as fiction: it's something from the
wellsprings of history. That makes it feel ethnographically real, I
think, in a way in which modern fantasy simply is not. It's a quality that's a
bit hard to put a finger on.
Perhaps the point about it is that it's in some respect like trying to make
a Stradivarius. All of the Stradivari that are going to be made have already
been made, by definition. The situation mightn't be quite so drastic with
mythology but on the other hand it may be. That doesn't stop people from making
new violins, but it might take some technical (or, in the case of myth,
imaginative) leap before any are made with the same kind of reputation.
I've published some Notes on Coleridge's Kubla
Khan, mainly on ascertaining when and where the poem was written. Many
thanks to Terje Bless for his kind and detailed assistance with editing.
Manually checking HTML5 is tiresome, so I've dusted off my old Validate With Logos project, and
made a check
referer script that I can link to, and a litmus stylesheet
script. The litmus is basically to be referenced as a normal stylesheet,
but then it generates one of the following three lines:
-
span.check:after { content: " Only works on inamidst.com!"; }
-
span.check:after { content: " \002713"; color: green; }
-
span.check:after { content: " \002717"; color: red; }
The first is if it's being used on a document outside of inamidst.com, the
second is if it's being used on a valid HTML5 document, and the third is if
it's being used on an invalid HTML5 document. There are some positive and negative tests for it if
you want to see it in action, and I've also used it at the bottom of my
Coleridge article.
So now, using this mechanism, I can tell from a glance whether a document is
valid HTML5 or whether it needs fixing.
Morbus just published his essay Resources not Services. "It has
taken me 11 years, I think, to come to a simple conclusion: I prefer building
resources over services. I define a 'resource' as content you may be
able to interact with; a 'service', on the other hand, is primarily
something you interact with, with content as complementary."
His Video Underbelly is just
the sort of thing that I should be working on, and am. Whereas Morbus is
obsessed with videos and Fort and so on, I'm obsessed with Shakespeare and Fort
and so on. It's not so much the object of the obsession as the obsession
itself; but it's the object of the obsession inasmuch as it drives the
obsession. When you think about single monumental achievements of people like
Dr. Johnson, Virginia Woolfe, or Sir Thomas Browne, you might think about their
largest works (e.g. the Dictionary), but if you know more than a trivial amount
about them, it's all of their varied and dazzling output which is
interesting. When you focus on what Morbus calls "trickling at content", like
Simon Rodia and Alfred Wainwright did so brilliantly, it's possible to create a
pretty big lake.
I wrote a little message to Aaron Swartz today called Printing
Made Easier. The general idea, for people who might not be conversant with
the particulars described in that message, is to have a very low-barrier method
for getting works printed. The benefit is that printed works tend to be more
persistent, in terms of their actually physically lasting (we don't know how
digital information is going to fare yet, really), and in terms of their being
"fixed"—obviating the whole "accessed on ..." problem.
In other words, I figure that if people want to publish their poetry or a
little academic note or something like that, they should be able to format
their manuscript and send it off somewhere to be printed in a kind of journal
like thing. It should either be free of charge for the person trying to publish
their work, or entail a micropayment, which leaves the rather big and possibly
unresolvable question of who should pay for all this, especially if it got
particularly popular.
Even if just a few libraries got funding to support such a scheme, it would
seem to be a fairly good idea; you could still have some kind of entry barrier
such as using TeX or following some style guide, or something, as long as it's
not made too big a barrier. It'd be handy for all those little publications
which are worth citing, but aren't worth getting peer reviewed or going to a
traditional publisher for.
I've published a transcript of The Farmhouse of Kubla
Khan by Morchard Bishop from the Times Literary Supplement (1957) online.
It complements my notes on Kubla Khan,
which I've been researching to a considerable extent these past few days.
Coleridge is really fascinating. Cf. my sweetshop comment and mild explanation of
Coleridge from the perspective of a Shakespearean researcher.
The dictionary came about by slow evolution, though Dr. Johnson's Dictionary
is considered a milestone. The thesaurus, on the other hand, was more of a leap
of imagination as far as I understand it. The two chief qualities which link
Dr. Johnson and the Rev. Roget is their tremendous patience in the process of
compilation. The main reason nobody did such huge achievements like this before
is not, perhaps, chiefly because they couldn't be imagined; but because they
couldn't be comfortably undertaken. And so it is that perhaps someone idly
imagined a work like the thesaurus far before Roget got around to actually
making one.
What, I wondered yesterday morning, would be a logical successor to these
two works? Personally, I'd like to see a work of complex etymology: a book
which outlines the etymologies of words in terms of similarities between how
they evolved, outlining, for example, words of ecclesiastical Anglo-Saxon
origin in one section, and those renaissance borrowings from Latin in another.
Then the subdivisions might be based on meaning and phonology and the history
of their introduction such as who coined them; things like that.
The aim would not, unlike the dictionary and thesaurus, to be particularly
comprehensive however. Indeed, a dictionary is meant to be
comprehensive in terms of etymology. What this work would do is to show the
relationships and connections between words. What words, for example, use dis-
as an intensifier and why? The explanations are more important than the raw
facts, because the raw facts are often very boring; phonology is irritatingly
so, for example, and yet you have to have a little knowledge of it to
understand some of the more interesting textural characteristics of a
language.
At first I suspected that the only class of people that this venture would
bring value to are the conlangers, I say a little contemptuously even though I
may be classed myself as a conlanger. But actually I think it'd have wider
appeal: I know of plenty of people who to really understand a word,
they look up the etymology. Just today I figured out what synthesis, the word,
was all about, and I was yet again astounded that I hadn't realised a word's
make-up through its lexicalisation. Chesterton's example of "holiday" is
perhaps my favourite. So anyway, a work like this would be spiffing if it
brought out the interesting elements of etymology.
I asked Daniel Biddle to give the badly-named "etythesaurus" a better name,
but he merely suggested that "what you need is a book that tells you how parts
of words can be put together to make words". In other words, an
etythesaurus...
And he's as obsessed at the mo' with lizards (and moths) as I am with
Coleridge.
I've been thinking about migrating from Equid to Mercurial for
inamidst's version control software. The main problem is that my Updates and Changes page is strongly
tied to the Equid system, which is quite customised for it and hence Mercurial
simply might not be powerful enough to replace it. I'm also a bit annoyed about
the .hg/ directories that it'll litter throughout the upload tree; I'll have to
put in an rsync ignore for them. It'd be nicer if I could maintain them all in
a completely separate shadow directory, but I've been through the Mercurial
book and manuals and I can't find any documented way to make that happen.
One of the primary reasons why I'm thinking about switching to Mercurial for
inamidst's version control software is a planned revision to Whits. Instead of
having just a big old list of entries per month, I'm going to have individual
pages for them, and some of them will have titles. I'd rather write things that
stand alone than assume their context within a larger list of entries.
I'd also like to make Whits a bit simpler, so this is a chance to make the
style and the backend system even simpler than it is now. Cody, who installed
pretty much the same software, is going in the other direction and bolting
things on and making them more complicated. I'm being careful to resist
that.
One structural change would be to have two kinds of posts. In the first
month of Whits, a lot of the content was very random, just junk notes of all
kinds of things. When I noticed that a lot of friends were reading this, I
started to make it more sensible, but I think that inadvertantly I made it less
interesting. Now instead of it being random bits of interesting scripts and
thoughts and design sketches and so on, it's just me monologuing as I do on IRC
sometimes. With What Planet is This? I made sure to not use the first
person pronoun except on very rare occasions: and I'd like to do a similar sort
of thing for the titled posts in the planned Whits redesign.
Not entirely sure it'll be better, but I think it'll entice me to make more
output.
The version problem came in when I was thinking about how to preserve all
the histories of the files online. Eventually I decided that I'd just not edit
the posts except to correct typos, but I realised that if I had considered this
a requirement then Equid wouldn't have covered it. Moreover it's getting pretty
slow to generate a new changeset delta, so at the very least I should probably
look into Equid optimisation.
A friend was telling me yesterday about his current digital camera. "It's a
Sony, but the way that it's shaped reminds me a lot of my old camera, a Kuhsel.
Once I was on the alps mountaineering, based at Zermatt, and I was way up in
the mountains taking some pictures with it. As I was taking one nice shot on a
very high ridge, I fumbled and saw it fall out of my hands and a couple of
thousand feet down a slope to a bergshrund.
"There was no way I could recover it, so I went back to base with my group,
and we waited for the other group to arrive back. When they didn't arrive back,
we went out looking for them, and found them carrying an injured man. We dashed
up to help him and then they told us what had happened: this chap had noticed
something glinting in the snow on the glacier, so he went to retrieve it. But
just as he got it, there was a huge rockfall and one of the rocks hit him on
the head quite badly, and he got a concussion.
"Of course he'd found my camera, and it was almost fine! I got the film out
of it and was able to develop it, and the only problem was that the lens had
been slightly depressed into the case, which prevented the loading, but
otherwise I'd have been able to use it. The chap who found it for me was still
feeling pretty bad three days later."
N.B. It wasn't called a Kuhsel, before you go Googling, but I didn't make a
note of the actual manufacturer that he said.
Sean B. Palmer, inamidst.com