Joseph Kesselman/Watson/IBM writes:
>
> >If your HTML is valid, you can try James Clark's tool SX
>
> If it isn't valid HTML, "tidy" will clean it up... and then XMLify it, if
> you use the right options. Tidy is available from the W3C's website.
hmm. having been fighting this tidy-then-transform system for the last
day or two, can anyone tell me how they solve two (related) problems?
a) as we know, authors scatter <h1>, <h3> etc across their document
like pointers. my target DTD needs structured divisions. who has some
good XSLT code to sort it out? I have evolved a dirtyish solution,
involing disable-output-escaping, but if someone else has a reliable
clean system, I'd love to see it
b) HTML allows PCDATA practically anywhere, so far as I can see. so
I get
<h3>Hello</h3>
I am the walrus
where my target DTD wants something more like
<h3>Hello</h3>
<p>I am the walrus
How do others deal with this?
sebastian
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|