Quality Assurance with RELAX NG and Python
Relevant files:
Explanation From IRC
<sbp> I have a local system in place for validating my site on the hard
drive without calling v.w.o or anything like that
<sbp> basically I wrote a subset of the XHTML 1.0 Strict DTD in RELAX
NG, by first working out the subset of elements that I'd like and encoding the
content model in Python and having it spit out the subset that I need
<sbp> then with some careful pruning and blah blah I've come up with a
nice little schema that represents more or less descriptively (I also produced
a summary of the frequencies of all the HTML elements as I use them) the kind
of HTML that I use
<sbp> I wanted to just feed that into a generic RELAX NG validator,
i.e. rnv, but it sadly wasn't that simple
<sbp> the content model thing:
xhtmlmodels.py (by which I figured out which content
model to use)
<sbp> so anyway, the problem was that rnv doesn't accept entities,
doctypes, and all that kind of crap
<sbp> so I ended up writing a preprocessor,
xhtmlnorm.py to check and mutate all that
stuff as necessary
<sbp> it lets you either convert named entities to numeric, get rid of
them altogether, or utf-8ise them, and has other stuff for CDATA, comments,
PIs, etc.
@@ More explanation and scripts...
Sean B. Palmer
Valid XHTML?