This document discusses how to derive RDF from HTML. Traditionally, it has been considered easier to derive blood from a stone, but the history, the applications, and the future of this problem are worth expounding.
With GRDDL as of 2004-02 starting to be accepted as the approach for gleaning RDF from XHTML, this document attempts to reinstate the background into the discussions, so that the requirements and past are not forgotten.
<sbp> it talks about RDF and HTML. typically, the discussions have always referred to a particular mechanism for doing so: even the public-rdf-in-xhtml-tf contains that "in" preposition <sbp> so what I wanted was a generic overview that approached the problem in its true sense: that is, there are various forms of HTML and XHTML, and authors want to be able to provide RDF alongside instances of those languages <sbp> so first we take a look at why they want to provide RDF alongside those instances (apps), what the requirements are, whether RDF is the correct solution, and so on <sbp> then we move onto looking at the proposed solutions, and some of the problems that have been raised whilst working on those solutions <sbp> and this time--contrary to my previous work, and the aspect of which received the most criticism--I'm going to try to reach a solid conclusion as to which approach is to be most admired; though I think that along the way it should become obvious that it's hard to not sit on the fence
The RDF-in-HTML problem is often oversimplified by lumping together all of the HTML dialects into a heap. But HTML 4.01, XHTML 1.x, and XHTML 2.0 have to be treated very differently when considered from the point of view of RDF association.
For example, if we are to consider embedding RDF into the <script> element, we must note that in HTML 4.01 its content is CDATA, in XHTML 1.x it is PCDATA, and in XHTML 2.0 the element may not exist at all!
}There are two relevant TAG issues:
The pre-history of metadata and HTML is impressive, and is recorded in part by DanC in http://www.w3.org/2004/01/rdxh/specbg
The timeline of RDF-in-HTML proposals is impressive. One of the first such proposals was in http://www.w3.org/RDF/FAQ#How the original RDF FAQ, from 1999 @@. Since then, the majority of work on the subject has been on suggesting new approaches; I produced a summary of these approaches back in May 2002. In @@, a task force was set up by the W3C to resolve the issue, headed by Joseph Reagle, and they took the approach of aligning applications to approaches, in the usual W3C style of requirements, discussion, resolution. They didn't get to the resolution part, but it was heartening nontheless to see requirements.
{ META: RDF Core comment
the "the RDF/XML Syntax document" (and issues list) on my previous work: "it
concludes that there is no single embedding method that satisfies all
applications and remains simple".
}
"GRDDL" is a bad name. It was probably intended to be like "RDDL", but that makes it very confusing when brought up in discussions as a novelty. @@ note that Dom brought this up on the pubrixt list.
The RDF in XHTML Task Force was set up to "[s]tate requirements for representing metadata in RDF within an XHTML document. Evaluate proposed solutions against those requirements. " - http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2003May/0000 Therefore, it was only chartered to solve the problem for XHTML.
Considering JMR @@ 's background as W3C policy analyst, the charters for the task force are rather short, and non-sesquipedalian:
The main points of note are that it was public, and open-ended in its rechartering. Though public, it was quasi-publically announced, and one has to be careful in talking about such things. Dave Beckett somehow didn't hear of it, and was slightly miffed by not having been invited to participate. But apart from that, the big names were there. (@@ culture)
The latter charter is confused as to whether Dominique Hazaël-Massieux or JMR is the chair, but I can say for certain that Dom became chair. The mailing list archives http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/ are joy.
Under JMR, the task force was focussed on providing an XHTML 1.* derivative specification that allowed embedded RDF, but got held back by issues of entity resolution. Under Dom, the task force focussed on transforming data out of XHTML.
The scenarios:
@@ Dan Brickley raises, in the document quoted in public-rdf-in-xhtml-tf/2003Jun/0003 the issue of HEAD vs. BODY, cf. Dan Connolly's tailing issue, payloads, etc.
http://esw.w3.org/topic/EmbeddingRDFinHTML EmbeddingRDFinHTML
proto-requirements!
Why validation is a requirement:
But this leaves us in a tricky position!
if "the RDF MUST NOT have to be reformatted from RDF/XML" is still a requirement and if they still want some kind of validation, we are stuck in a difficult situation
Masayasu Ishikawa - http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2003Jun/0017
Solutions:
Embed in comments: public-rdf-in-xhtml-tf/2003Jun/0020 XHTML1.* 1. Hacks: a. Commenting out the RDF, b. Sticking in a script tag.
<sbp> "I actually prefer the script element over comments and wonder what led CC to use a comment" - JMR. I think we're all meant to blame Aaron for that, and I'm still not sure what the rationale was. I seem to recall him saying/admitting somewhere that it was a bad idea, but knowing Aaron there's usually a sane defence for his madnesses Nope: http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2003Jun/0032
Embed in object... Link! meta/@content (yuck)
<meta name="rdf-meta" content=" <rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:dc='http://purl.org/dc/elements/1.1/'> <rdf:Description rdf:about='http://www.w3.org/'> <dc:title>World Wide Web Consortium</dc:title> </rdf:Description> </rdf:RDF> ">
http://www.w3.org/2002/04/htmlrdf TimBL: Embedding RDF in HTML - a TAG subissue The TimBL solution:
Does GRDDL solve the scope problem? Transformations out of blockquote?
public-rdf-in-xhtml-tf/2003Jun/0027 discusses this
a. The RDF MUST be semantically context free and inherits no context from its container document. (If it is metadata about that document, it should state rdf:about="".) b. The RDF MUST apply only to its containing document. [FOAF]
- http://www.w3.org/2003/03/rdf-in-xml.html#req-scope
But I don't really think of scope as being a requirement.
There are a lot, but they can be classed as:
Link, Embed, Augment & Scrape
table data
Approach | HTML 4.01 | XHTML 1.* | XHTML 2 |
---|---|---|---|
Modularize | No | Nearly | Maybe |
Script | Yes | Maybe | Maybe |
GRDDL | No | Yes | Yes |
Trackback is simple enough to use the <meta> element, perhaps using DC's prefixing scheme. { SOLUTION: meta element
Every few months or so, someone will propose a "new" solution involving augmenting or changing the constructs that HTML already offers, namely the meta and link elements.
the nested meta approach... comes up everywhere. http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Feb/0103 "you need yet another RDF syntax like you need a hole in the head" - Steven Pemberton http://www.dubinko.info/writing/meta/
Encoding Dublin Core in HTML (RFC 2731), John Kunze, 1999-12-1 http://www.ietf.org/rfc/rfc2731.txt cited from public-rdf-in-xhtml-tf/2003Jun/0015 [[[ * Expressivity must go beyond flat (the existing RFC handles that) * Validation of (X)HTML required * The terms used in the DC model written as RDF-in-XHTML must be easily extendable to handle new DCMI terms, types, etc. * Preferably terms are related to their namespace URI ]]] also: DanC's HyperRDF is similar, and GRDDL... Augmeta, of course
@@ split these up into ones which *change* XHTML, and ones which *abuse* it }
http://www.w3.org/2003/12/rdf-in-xhtml-xslts/complete-example
@@ review latest XHTML 2.0 documentation
I'm going to naysay awhile. Perhaps I yearn for the pubrixt list days of Reaglish yore, but nontheless I have issues with GRDDL.