HTML, Metadata, and RDF

This document discusses how to derive RDF from HTML. Traditionally, it has been considered easier to derive blood from a stone, but the history, the applications, and the future of this problem are worth expounding.

With GRDDL as of 2004-02 starting to be accepted as the approach for gleaning RDF from XHTML, this document attempts to reinstate the background into the discussions, so that the requirements and past are not forgotten.

<sbp> it talks about RDF and HTML. typically, the discussions have always referred to a particular mechanism for doing so: even the public-rdf-in-xhtml-tf contains that "in" preposition <sbp> so what I wanted was a generic overview that approached the problem in its true sense: that is, there are various forms of HTML and XHTML, and authors want to be able to provide RDF alongside instances of those languages <sbp> so first we take a look at why they want to provide RDF alongside those instances (apps), what the requirements are, whether RDF is the correct solution, and so on <sbp> then we move onto looking at the proposed solutions, and some of the problems that have been raised whilst working on those solutions <sbp> and this time--contrary to my previous work, and the aspect of which received the most criticism--I'm going to try to reach a solid conclusion as to which approach is to be most admired; though I think that along the way it should become obvious that it's hard to not sit on the fence

Contents

{ refactor this: * the problem is oversimplied * here's why * this sucks also, this bit is too naive.

The RDF-in-HTML problem is often oversimplified by lumping together all of the HTML dialects into a heap. But HTML 4.01, XHTML 1.x, and XHTML 2.0 have to be treated very differently when considered from the point of view of RDF association.

For example, if we are to consider embedding RDF into the <script> element, we must note that in HTML 4.01 its content is CDATA, in XHTML 1.x it is PCDATA, and in XHTML 2.0 the element may not exist at all!

}

There are two relevant TAG issues:

First, A Potted History

The pre-history of metadata and HTML is impressive, and is recorded in part by DanC in http://www.w3.org/2004/01/rdxh/specbg

The timeline of RDF-in-HTML proposals is impressive. One of the first such proposals was in http://www.w3.org/RDF/FAQ#How the original RDF FAQ, from 1999 @@. Since then, the majority of work on the subject has been on suggesting new approaches; I produced a summary of these approaches back in May 2002. In @@, a task force was set up by the W3C to resolve the issue, headed by Joseph Reagle, and they took the approach of aligning applications to approaches, in the usual W3C style of requirements, discussion, resolution. They didn't get to the resolution part, but it was heartening nontheless to see requirements.

Todos

"GRDDL" is a bad name. It was probably intended to be like "RDDL", but that makes it very confusing when brought up in discussions as a novelty. @@ note that Dom brought this up on the pubrixt list.

The Task Force

The RDF in XHTML Task Force was set up to "[s]tate requirements for representing metadata in RDF within an XHTML document. Evaluate proposed solutions against those requirements. " - http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2003May/0000 Therefore, it was only chartered to solve the problem for XHTML.

Considering JMR @@ 's background as W3C policy analyst, the charters for the task force are rather short, and non-sesquipedalian:

The main points of note are that it was public, and open-ended in its rechartering. Though public, it was quasi-publically announced, and one has to be careful in talking about such things. Dave Beckett somehow didn't hear of it, and was slightly miffed by not having been invited to participate. But apart from that, the big names were there. (@@ culture)

The latter charter is confused as to whether Dominique Hazaël-Massieux or JMR is the chair, but I can say for certain that Dom became chair. The mailing list archives http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/ are joy.

Under JMR, the task force was focussed on providing an XHTML 1.* derivative specification that allowed embedded RDF, but got held back by issues of entity resolution. Under Dom, the task force focussed on transforming data out of XHTML.

APPLICATIONS/SCENARIOS

The scenarios:

FOAF
http://www.w3.org/mid/20030609180310.GH24619@tux.w3.org
FOAF in XHTML Files, Dan Brickley (RDDL, namespace documents)
FOAF should be using linking! Note the RDFe character encoding problem.
Trackback
http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2003Jun/0004
Scenario: Trackbacks, Joseph Reagle
Dublin Core
http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2003Jun/0015
Scenario: Dublin Core Dave Beckett
Also...
geoURL, Creative Commons (these don't like embedding)
http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2003May/0002
Atom transform, SSR (public-rdf-in-xhtml-tf/2003Aug/0002, and public-rdf-in-xhtml-tf/2003Aug/0003)
RDDL (noted in http://www.w3.org/2002/04/htmlrdf - TAG peripheral)

@@ Dan Brickley raises, in the document quoted in public-rdf-in-xhtml-tf/2003Jun/0003 the issue of HEAD vs. BODY, cf. Dan Connolly's tailing issue, payloads, etc.

http://esw.w3.org/topic/EmbeddingRDFinHTML EmbeddingRDFinHTML

proto-requirements!

REQUIREMENTS (and "proto-requirements"!)

Validation

Why validation is a requirement:

But this leaves us in a tricky position!

if "the RDF MUST NOT have to be reformatted from RDF/XML" is still a requirement and if they still want some kind of validation, we are stuck in a difficult situation

Masayasu Ishikawa - http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2003Jun/0017

Solutions:

Embedability and the View Source Clan

http://www.w3.org/2002/04/htmlrdf TimBL: Embedding RDF in HTML - a TAG subissue The TimBL solution:

Scope

Does GRDDL solve the scope problem? Transformations out of blockquote?

public-rdf-in-xhtml-tf/2003Jun/0027 discusses this

 a. The RDF MUST be semantically context free and inherits no
     context from its container document. (If it is metadata about
     that document, it should state rdf:about="".)
  b. The RDF MUST apply only to its containing document. [FOAF]

- http://www.w3.org/2003/03/rdf-in-xml.html#req-scope

But I don't really think of scope as being a requirement.

SOLUTIONS/Approaches Summary?

There are a lot, but they can be classed as:

Link, Embed, Augment & Scrape

table data

Approach HTML 4.01 XHTML 1.* XHTML 2
Modularize No Nearly Maybe
Script Yes Maybe Maybe
GRDDL No Yes Yes

Trackback is simple enough to use the <meta> element, perhaps using DC's prefixing scheme. { SOLUTION: meta element

The <meta> Solution

Every few months or so, someone will propose a "new" solution involving augmenting or changing the constructs that HTML already offers, namely the meta and link elements.

the nested meta approach... comes up everywhere. http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Feb/0103 "you need yet another RDF syntax like you need a hole in the head" - Steven Pemberton http://www.dubinko.info/writing/meta/

Encoding Dublin Core in HTML (RFC 2731), John Kunze, 1999-12-1 http://www.ietf.org/rfc/rfc2731.txt cited from public-rdf-in-xhtml-tf/2003Jun/0015 [[[ * Expressivity must go beyond flat (the existing RFC handles that) * Validation of (X)HTML required * The terms used in the DC model written as RDF-in-XHTML must be easily extendable to handle new DCMI terms, types, etc. * Preferably terms are related to their namespace URI ]]] also: DanC's HyperRDF is similar, and GRDDL... Augmeta, of course

@@ split these up into ones which *change* XHTML, and ones which *abuse* it }

RDF-in-XHTML Proposal, RDXL, GRDDL

The Name's Lambda. Joe Lambda

http://www.w3.org/2003/12/rdf-in-xhtml-xslts/complete-example

@@ review latest XHTML 2.0 documentation

I'm going to naysay awhile. Perhaps I yearn for the pubrixt list days of Reaglish yore, but nontheless I have issues with GRDDL.