[Home] [By Thread] [By Date] [Recent Entries]
In article <200701021413.20468.frans.englich@t...> you write: >These paragraphs gives good hints to the complexity in this, but it's >not very exact("Specifically, CR, NEL ..."). I'm not sure what you find inexact about it. It lists the three characters that must be escaped in text to avoid their being normalised when re-read, and the five that must be escaped in attributes for the same reason. If you're serialising as XML 1.0 you don't need to bother escaping NEL and LSEP (because they don't get normalised when read). But as the text you quoted notes, a 1.0 external entity included in a 1.1 document is parsed as XML 1.1, so if your output might be used as an external entity in that way - rather than as a complete XML document - you will need to escape them. You might as well escape them anyway. I'll try to summarise: 1-1F except CR, TAB, NL: Can't occur in XML 1.0. Can occur in XML 1.1 and must be escaped. CR: Always escape. NL, TAB: Escape in attribute values. NEL, LSEP: Always escape (only essential if serialising as XML 1.1). 7F-9F except NEL: Always escape (only essential if serialising as XML 1.1). less-than, ampersand: Always escape. greater-than: Escape in text if it immediately follows two close-square-brackets, as that sequence is only allowed as the end of a CDATA marked section. single-quote, double-quote: Escape in attribute values quoted with the same kind of quote. I think it's safe to always escape all of these, but always escaping NL would make things unreadable. -- Richard
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



