Quality Assurance with RELAX NG and Python

Relevant files:

Explanation From IRC

<sbp> I have a local system in place for validating my site on the hard drive without calling v.w.o or anything like that
<sbp> basically I wrote a subset of the XHTML 1.0 Strict DTD in RELAX NG, by first working out the subset of elements that I'd like and encoding the content model in Python and having it spit out the subset that I need
<sbp> then with some careful pruning and blah blah I've come up with a nice little schema that represents more or less descriptively (I also produced a summary of the frequencies of all the HTML elements as I use them) the kind of HTML that I use
<sbp> I wanted to just feed that into a generic RELAX NG validator, i.e. rnv, but it sadly wasn't that simple
<sbp> the RNC schema: xhtml.rnc
<sbp> the content model thing: xhtmlmodels.py (by which I figured out which content model to use)
<sbp> so anyway, the problem was that rnv doesn't accept entities, doctypes, and all that kind of crap
<sbp> so I ended up writing a preprocessor, xhtmlnorm.py to check and mutate all that stuff as necessary
<sbp> it lets you either convert named entities to numeric, get rid of them altogether, or utf-8ise them, and has other stuff for CDATA, comments, PIs, etc.

@@ More explanation and scripts...

Sean B. Palmer * Valid XHTML?