Semantic Web Development

This document describes work conducted on the Semantic Web by me, Sean B. Palmer, since I got interested in the field in around 2000. I've tried to record most of the work that I've done along various axes, and tie it together to find structure in my approaches to helping out with the Semantic Web efforts.

The things listed below are mainly an historical account. For the most recent usable tools, try the Pyrple RDF API, the GRDDL Demo, and the n3proc.

Notation3 and CWM

Since 2000, when it was first developed, I've been interested in documenting and writing tools that handle the Notation3 non-XML serialisation of RDF. Apart from the Rough Guide to N3 (2002), most of my efforts have been on tools to produce, filter, and consume N3. I started with n3.py (2002), which was part of my Eep RDF toolkit. It didn't support all of Notation3, but was subsequently picked up and used as part of the RAP RDF API for PHP. Later the same year I wrote and released a different parser, afon.py, which was much more advanced and the first N3 parser to support path syntax. But it was soon superceded by other better parsers, and didn't manage to supplant CWM's notation3.py which was the intention. So eventually, when TimBL published his RDF BNF grammar for Notation3, I wrote n3proc or n3p (2004), which is now used as part of both rdflib by Daniel Krech (eikeon) and 4Suite & Versa by Uche and Chimezie Ogbuji. As well as being the first processor to be based on the RDF BNF, featuring a novel metaparsing approach, n3proc also features one of the most extensive N3 test suites to date.

The first and still primary tool for processing Notation3 and for doing logic and trust on the Semantic Web is Tim Berners-Lee's Closed World Machine, developed in Python with help from Dan Connolly and other of the Semantic Web staff and developers at the W3C. The development of Notation3 has been inextricably linked to CWM, so I contibuted to the development of Notation3 as a language by, for example, performing a Great QName Survey (2002), but I've also done plenty of work on CWM itself.

I wrote a Guide to CWM in 2001, back when the code was very dense and barely documented and people didn't know how to use it—the good old days when CWM was a black art. I also wrote modules for it including cwm_math.py (2001), cwm_crypto.py (2002), and, the only one so far which didn't make it into CWM, lexp (2002). I've reported and fixed bugs with CWM, and often discussed development on #rdfig and #swig, scheduling some of the meetings on public-cwm-talk.

Python RDF APIs

In order to write applications that handle RDF, you need to have a low-level set of modules for parsing RDF data, storing and querying it, and serialising it back out again. CWM was one such application in Python, but I also tried my hand at writing some of my own. Early attempts including SWIPT (2001) and the aforementioned Eep (2002) were somewhat rudimentary efforts whilst I learned Python. Eventually in that same year, I got around to writing the passable Eep 3, which later formed the basis for my current masterwork, Pyrple (2003), which is a very generic API and has "in-memory storage with API-level querying, experimental marshalling, many utilities, and is small and minimally interdependent".

Pyrple has mainly been noticed and used by friends such as Danny Ayers and Libby Miller, but it also appeared packaged in the pld-linux linux distro. In 2004, Andrea Peltrin (deelan) wrote a variant of Pyrple called Purple which offered MySQL storage on top of the existing backend, along with some other changes. It's been mentioned in a few places, including notably a talk presented by Nathan R. Yergler at PyCon 2005. Pyrple is, as of 2006, three years old and I did start work on some successor modules which I wanted to build into a new API called Pynk, possibly basing it on top of Redland. In any case, parts of this new toolset have ended up in places from rdflib to the W3C's Data Access group site. Daniel Krech (eikeon) and I also conversed about merging with rdflib, but no such merge has yet transpired.

FOAF and Applications

I've built numerous applications on top of Semantic Web tools, using mainly CWM and my own Python RDF APIs. One of the first was RDF Lint (2001), which was an RDF Schema consistency checker implemented in Notation3 for use in CWM, and based upon TimBL's Schema Validation I. RDF Lint was very useful, for example, in my development of the EARL RDF Schema. Also in the same year I wrote the first ever RDF Wiki in Python and a Coördinates Processor in Notation3.

The Friend Of A Friend project has been run since more or less the same time that I got interested in the Semantic Web by Libby Miller and Dan Brickley. I've had a FOAF file since July 2001, and have written two services to facilitate its use. The first, FOAFQ (2003) uses a minilanguage to enable easy querying of FOAF data that Libby had put on the web. Later, I wrote FOAFCite with an even simpler interface to get quick excerpts for pasting into FOAF files, using data collected by Christopher Schmidt using his bot Julie.

After I'd written Pyrple, I started work on two major RDF applications based on top of it: RDFe and Hoot. RDFe is an RDF editor using a very novel approach. Hoot was to be an OWL Species Checker, but though it passed the WebONT test suite, there were still some bugs in it and as a result I didn't get around to publishing hoot.py. My GRDDL client application, garner.py, also uses Pyrple.

As well as these high-level applications, I've also written a couple of papers about some very low-level applications, both in 2003. The first, Pondering RDF Path exposed an inherent flaw in RDF Path applications to date and proposed a new path syntax that didn't fall into the same trap. The second was GraphSL, an RDF Graph Schema Language, which was born of my work on EARL and the XML Accessibility Guidelines. It's a "different kind of schema language for RDF that lets you make subsets of RDF graphs against which you can validate instances".

RDF and HTML

Before I was interested in the Semantic Web I was intrigued by HTML development, and indeed it was the gateway into my travails with RDF. So it was only natural that I should work on the relationship between the two technologies. Mainly this consisted of working out how to embed RDF in HTML. I experimented with methods of doing this, for example working on Augmented Metadata in XHTML with Murray Altheim in May 2002. Later in the same month I wrote a large summary of all the methods of doing so called RDF in HTML: Approaches, which defined the state-of-the-art for the time and was referenced informally from the RDF Syntax Specification two years later when the RDF Core WG wrapped up their work on fixing RDF. Later when Phil Ringnalda wrote about the slow state of RDF-in-HTML development, I wrote a polite but firm response assuring him that people were doing the best they could.

When Dan Connolly took over chairship the RDF-in-XHTML Taskforce in 2003 (see notes), the focus on this area moved to GRDDL, and I set up a GRDDL Demo service (2004), as well as a naïve client in bash in 2006. Recently, I asked DanC on #swig to reconsider the relationship between GRDDL and Microformats.

As well as putting RDF in HTML, I've also attempted to get HTML from RDF by various methods. The most notable of these is HTML in RDF/Notation3 (2002), which took a sketch of TimBL's and, exploiting some dark corners in CWM, turned it into a fragile but viable method and the first ever demonstration of getting HTML blood from an RDF stone. I also used to maintain my homepage of circa 2001 as RDF generated from rather flat Notation3 source using CWM.

Education and Publication

Learning about the Semantic Web as early as I did was difficult because there wasn't much documentation around at the time, so the process was very slow. As a result, I gained many wonderful friends, and we decided to do what we could to document the slowly evolving semantic movement. The first steps towards doing this were when Aaron Swartz, Seth Russell, William Loughborough, and I set up the Semantic Web Agreement Group, SWAG, in January 2001.

Later in the same year I produced a widely-read article constituting a Semantic Web Introduction, which was famously denigrated by Shirky in one of his articles and defended by Shelley Powers amongst others. I also wrote the rather less controversial Semantic Web in Haiku (2002), providing some light humour to learning about RDF/XML and other aspects of the SW; this was especially enjoyed by Jim Hendler, who asked that I write one about OWL, which I sadly haven't yet found my way to doing.

I wrote "Using RDF to Annotate the Semantic Web" (ILRT Technical Report 1015) with Phil Cross and Libby Miller, which was accepted to the K-CAP 2001 Knowledge Markup & Semantic Annotation Workshop, (21st October 2001, Victoria B.C., Canada). In 2004 I contributed a chapter on the Semantic Web to the book "Online Education Using Learning Objects", published by RoutledgeFalmer, ISBN 0-415-33512-4. I've served on the programme committees for the 1st Workshop on Friend of a Friend, Social Networking and the Semantic Web (1st-2nd September 2004, Galway, Ireland), and the 2nd Workshop on Scripting for the Semantic Web (11th June 2006, Budva, Montenegro).

Formats and Data

The first Semantic Web format that I worked on, i.e. application of RDF, was the Evaluation and Report Language, EARL, for the W3C's Web Accessibility Initiative, under the aegis of the Evaluation and Repair Tools IG. The version I produced was 0.95, but now work is ongoing to get it hopefully to recommendation—I'm not a part of those efforts so far due to time and volition constraints. A couple of interesting projects arose as a result of this work, including my merge of WCAG with EARL, my conversion of EARL 0.9 to 0.95, and an RDFS to XHTML stylesheet.

As well as EARL, I've also worked on several alternate serialisations for RDF in XML, including BSWL (2001), and the sadly ignored XENT (2003). This latter format, which Daniel Biddle later suggested could be renamed AXENT, was novel through its use of using XML only for certain constructs and using non-XML syntax for others, noting that each resulting document is a well formed and easily parsable XML document. I also wrote the non-XML PieNT in the same year, which, being a stripped down profile of Notation3, was Turtle before Turtle ever existed.

I also worked on the Atom Extensibility Framework (2003) with Sjoerd Visscher and Ken MacLeod, which was a non-striped alternate serialisation of RDF in XML intended for use by the Atom syndication language. It too was never taken up, despite some interest from Sam Ruby who was present during the whole development process. Continuing with the syndication trend, in 2005 I worked on RSS 1.1, a clean-up of RSS 1.0 using more up-to-date RDF constructs, with Christopher Schmidt and Cody Woodard.

I've tended to produce more languages than data, but I have published lists of CSS properties, chords, and so on in RDF; and I also contributed to Dan Connolly's list of URI Schemes in Notation3 (2001). But this trend to produce more information is something that I've written about (2006) in discussion with Jo Walsh, Christopher Schmidt, Uche Ogbuji, Joe Geldart and others, and could write up in more detail.

Sean B. Palmer