Quality Assurance with RELAX NG and Python
Relevant files:
Explanation From IRC
<sbp> I have a local system in place for validating my site on the hard 
drive without calling v.w.o or anything like that
<sbp> basically I wrote a subset of the XHTML 1.0 Strict DTD in RELAX 
NG, by first working out the subset of elements that I'd like and encoding the 
content model in Python and having it spit out the subset that I need
<sbp> then with some careful pruning and blah blah I've come up with a 
nice little schema that represents more or less descriptively (I also produced 
a summary of the frequencies of all the HTML elements as I use them) the kind 
of HTML that I use
<sbp> I wanted to just feed that into a generic RELAX NG validator, 
i.e. rnv, but it sadly wasn't that simple
<sbp> the content model thing: 
xhtmlmodels.py (by which I figured out which content 
model to use)
<sbp> so anyway, the problem was that rnv doesn't accept entities, 
doctypes, and all that kind of crap
<sbp> so I ended up writing a preprocessor, 
xhtmlnorm.py to check and mutate all that 
stuff as necessary
<sbp> it lets you either convert named entities to numeric, get rid of 
them altogether, or utf-8ise them, and has other stuff for CDATA, comments, 
PIs, etc.
@@ More explanation and scripts...
Sean B. Palmer 
 Valid XHTML?
 Valid XHTML?