C1 characters in XML 1.0 and HTML 4

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

From: "Waters, Michael, Springer US" <Mike.Waters@s...>
To: "XML Developers List" <xml-dev@l...>
Date: Sat, 12 Mar 2011 18:02:09 -0500

I found some related material in the list archives, but I wanted to check my understanding of the use of C1 characters in XML 1.0 and in HTML 4.

We have a UTF-8 encoded XML document that has gone through a number of conversions and import/export routines into/out of a CMS. At all times, the XML document was valid against the DTD, and in Oxygen everything seems fine. No errors were reported in the workflow until a late stage, where in rendering to HTML Saxon reported:

net.sf.saxon.trans.DynamicError: Illegal HTML character: decimal 146

I traced the error to an article title, where there was an embedded hex character reference:

Language rights versus speakers rights

Unicode character U+0092 is given as a control character in a private use area. I can’t see our vendor or any workflow step (un)intentionally adding that character. About the only thing that makes sense to me is that at some point (probably the source document), Windows-1252 encoding was used, where decimal 146 is, I think, a right single quote. (Whether that’s the appropriate character in this case is another matter.)

So, in all the XML processes, character U+0092 was passed through as legal, but in outputting to HTML it is illegal? I’m missing something here, surely.

Curiously, in my readings, HTML 5 seems to be special-casing Windows-1252 encoding, along with UTF-8, in that it must be supported:

http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#character-encodings-0

Best regards,

Mike Waters

Follow-Ups:
- Re: C1 characters in XML 1.0 and HTML 4
  - From: Bjoern Hoehrmann <derhoermi@g...>
- Re: C1 characters in XML 1.0 and HTML 4
  - From: Michael Kay <mike@s...>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >