[Home] [By Thread] [By Date] [Recent Entries]
On 08/04/2010 13:01, Robby Pelssers wrote:
Ok.... or rather elements don't have element content. That's why they use CDATA. a bad workaround (compared to fixing the input schema) as in particular you lose a lot of validation that the input is at least well formed. Which is the course of the present difficulty. And in my opinion that's not so bad since from a data point of view these html tags are pure a rendition thing. If you are going to quote the XML fragment as CDATA it is your responsibility to check that what you are quoting is well formed XML, since the XML parser will not do so. the posted fragment was not well formed, so it seems reasonable that an error is generated at some point once the fragment is unquoted. If you want to do automatic fixup to the quoted fragments (which is often necessary when processing feeds for example with spurious "html" markup in them) then the thing to do is parse the fragment using an excessively lenient parser such as tag soup, tidy or my own htmlparse but exactly what errors they will tolerate depends on the parser. I'm not sure what those three do with an unquoted < as occurs in your fragment for example. But basically I see 2 options from the responses:
This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. ________________________________________________________________________
|

Cart



