TimBL's XML 2000 Talk

The following is transcribed from TimBL's XML 2000 keynote talk on the Semantic Web, available as an MP3 (keep checking it).

The talk is obviously © TimBL 2000. This (partial) transcription is made in good faith, but I'm not a qualified stenographer, and cannot vouch for the preciseness of the transcription. If in doubt, listen to the MP3 (if you can). Slides for this talk are also available.

WIP - the talk is very rapid, so it's quite difficult to capture. I have paraphrased much of what was said. Editorial comments such as time stamps etc. are made in [square brackets].

It's great to be here. We come to conferences to figure out stuff when people are thinking like we are, and we share a vocabulary and go forward. The interesting part of life is agreeing how to go forwards, and it's hard work. One of the hardest things to do is chair a group, to be the facilitator of people. And very often the chairs and people; no one realises who is behind the standards, and I want to say thank you. Lauren Wood has been chairing the Document Object Model working group for a long time, and she's not now, but I want to say thanks, and many people will. She's been in XML from the very beginning. [clapping] Lauren, could you come up here for a moment? [Lauren: "thank you"] [more clapping]. Thank you very much Lauren. She's contributed enourmously.

So this is the opening talk of the Knowledge Technology track. At midnight I heard that there will be loads of people mixed in their understanding. I apologize in advance fo the vocabulary. I'm going to talk about the Semantic Web, and the philosophy. Later on there are many talks about [something about red channel]. In fact, in a way that's the most important thing to get right, the rest will just come out in the wash.

Which bits will fit in where? I'll Talking about what we've got now, what's coming togethe fast, and the next steps. What's the Web bit? The Semantic Web will be an information space, a mapping from a URI to a representation of some information; it creates the space, something which is navigable. You can use the primeaval stuff that you have for navigating. It' allows self-describing documents, which has an identifier that you can dereference to find out what the document means; everything of importance will have a URI. The Semantic Web will make heavy use of it. A URI is not a recipe, it's a thing that if you know what it is by any method, but one of the methods that we have is TCP and DNS to look up w3. "Semantic" is a great word, because you can get into endless rathole discussions about the meaning of meaning. I'm not going to talk about natural language; to me the most important stuff is the machine processable stuff: a Web of data; [5:55] of information which you can deduce... it's something you can process, hard data... you can follow more than one link, and you may have lots of solid information on the end. If you have fuzzy relations, as in natural language, they decay much more quickly.

What are the bits of data? At the basic level, the semantic, e.g. of XML in an invoice, the semantics are practically "this is a bank statement", if you feed it into a financial system it works. The test of the Semantic Web is if you can feed it in anywhere, it works. That will be they way that things are grounded for a long time to come. [7:15]

We will have a relationship (quote). your machine will convert it from on lnguage to another. Nobody anywhere sudently understood the meaning of money, some rules were just applied to that, the ATM coughs up the money, and the company thinks they are paid. The interesting part is the declarative part. If you write what you mean rather than what you want done with it, it can be repurposed so much better. XML is exciting as a declarative language, no font sizes... because you can repurpose it. [8:31] So in the same way, the Semantic Web will share that [notion of repurposability].

One way of looking at the core of the Semantic Web is to imagine the databases out there: they are potentially data that are good for the Semantic Web. Same with pretty sort of weather maps and HTML tables with lots of images in them on the Web. When someone creates new desktop database machine, they create thecolumn names, which in fact share concepts. There's whole bunch calledzip, or zipcode, or even zipcd. [...] You can invent URI for that column: zip, sothat one of those arrows can be one single statement that says that thesetwo zip concepts are totally equivalent to this other zip. There's a relationship between "where" and "zip", if I don't know where, supplying a zip will be an answer: that doesn't mean that a where is a zip; it's a one-way relationship. That is just a small scale link. Try and imagine me ten years ago demonstrating the World Wide Wed to someone: this is really cool, you click on one document, and get another document. [10:09] Can you imagine these people before the Web saying, "that's no big deal. So you made another document come up in another window..." It's difficult to imagine, but supposing that link could go anywhere: there are other databases so that when I make the zip, there are other links in XML which link that concept of zip to each other there's a whole morass. They are all pretty much tied into the definitive US postal service zip concept, even if the USPS don't have a central definitive term for what a zip code is, a URI with uspo in it.

So the excercise is in scaling up the system. It's really powerful to intergrate information from different sources. It's a graph; a Web not a tree. The basic model of the Semantic Web is that there are things and relationships betweeen them. [shows slide about meeting]. The tail of the arrow is a person, and the head is a conference; they have a given name and an email address. The Web is full of information which has come fom different sources. If people can use URIs for different services, then the data could come from anywhere. The lat and long etc. Now we know that the person attending is actually the chair: we lay the informaton on top. The important part is using URIs.

Actually, see this blob down here with an email address? If you find another blob which has the same email, when you're organizing a conference, it's a pretty good assumption that they're the same person. If not, they have to sit in the same seat. [13:08] [laughs]. The step I want you to make if you not sitting at the front is that you have to scale the thing up. Pick it out of a list of first name... I'll pick first name from someone else's database, so that other people look at my database and can [...].

The conversion of one language to another is important for evolution of languages, to evolve from n to n+1. There was a time that if you saved doc with n+1, then it would panic if it was an n version looking at n+1, "I cannot look into the future". I call it "the version upgrade problem". There's also the test of independant invention. Suppose that the invoices are oidigfined [?], when you want to do transatlantic commerce. The trick is that when you do vesion n, they have to [15:15] find out how to degrade n+1 to n. So that's a really important thing for the technology. It's important for the future.

So. The architecture, I'm going to go through the layers. On the bottomn we have technology based on URIs and XML i18n. Remember that we're in KT, we're all enthusastic about other XML stuff. In this particular talk I'm going to talk about Semantc Web stuff, information with solid processable meaning. On top of this, there is RDF and other good languages. Circles and arrows; RDF is basically circls and arrows. On top of that, the schemas are more sophisticated. HasEmailAddress, one person has an email address. Therefore an object in two contexts with the same email is the same object, that's an ontology. Integrating this is going to be difficult, but there's an industry and lots of stuff out there already. In fact, they become more powerful as they go up; I like to draw the line at logical systems. Having a universal logical language sounds fancy, but something that's more useful is that you can make a proof of a deduction, and send it to someone else. XML DigSig is nice, and allows you to sign something, and you can have more trust in it... it becomes exciting when you can connect it to proofs. Apart from all the pattern problems, the problem is that you need to just to be able to sign the document, but also a powerful language, but also what this means, what it allows you to conclude. Being able to talk about the keys allows a rich model of trust. They are just simple "don't or do trust" thing, but now we'll be able to trust certain data in certain ways depending upon it being signed by certain keys. [19:48]

What does it mean if signed by this key - need to be able to talk about the keys in the SW, gives a rich model of trust. PKI so far has been very simple, it's much better if you have certain meaning attached to certain keys. The pinnacle (where I can retire) is if we put this all together, and end up with a system of trust. What you get... at each level... the different levels allow you to talk about different things; let me go into that in more detail. The RDF schema layer - er... you should think of the XML namespaces and XML Schema as one layer up - is a very minimalist model. RDF does not insist that any object has any particular basic properties. It has a very crude concept of thing - called a resource, and a very simple of property. It has things like subClass, subProperty, domain range, and human readable text labels. It's very unconstrinaing; you can use it for any data model, by adding things to the core stuff. The basic stuff is not designed to impede.. constrain. It allows very wide interoperability. One end of the arrow [21:25] and one cell in a relational database, the arrow the column relationship... The next layer up is the ontology layer, so for example you can say that, "this property is transitive"; if you.. if I um... am hierarchially inferior to you, and you are hierarchially inferior to someone else, this is what hierarchives [[sic?]] are like - you can write down that some relationship is transitive, but you can't say what transitive means; that requires logic. [22:22] You've got more powerful schema for poining out things, such as cardinality.

There's a huge ontology comunity out there, projects like SHOE, people getting together. Groups such as OIL, groups to talk about stuff; a lot of people to bring together here. A nice thing about the ontology layer is that if you bring together a whole lot of people here in different application areas, a lot of companies in different problem domains are now hiring ontologists, when two years ago they didn't know what an ontologist was - they thought it was something to do with sort of fossilized ants or something [laughs]. [23:15]

But the reason they're doing that is if you look at the bio-tech, they're all these chemicals, all of the genetic stuff, they need to be described, to come up with the vocabularies, and different modelling systems that fit into this layer. So if we just put in this amount of functionality, a lot of people start producing data, and they start putting this on the Web [23:45] which we can then analyze, and process about, and do really cool things with. So this is, if you like, it's integrating it with a stack, and getting it out there is some sense; it's low-hanging fruit. The crucial thing about this level is that it's not Turing Complete; you can't write programming languages in ontology language, you can't write what the things means. Of course you can have programs that have this hand coded in, and you can get things like transitive closure. [24:30]

Hopefully we'll have a lot of interconversion. [Quite a bit of the talk still to be transcribed; sorry. The talk's length is 57:48 in total.]

Sean B. Palmer