[Home] [By Thread] [By Date] [Recent Entries]


> I'm certainly no Relax expert, but on the face of it is does *NOT*
> sound reasonable. In general XML and Unicode processing, one *MUST*
> handle characters with code points beyond U+FFFF. They are not
> optional.  This is true even if your programming language (Java
> perhaps?) has inadequate support for them.

What was I thinking? Don't code at 2:00 a.m., or at least don't email lists
when you can't figure stuff out at 2:00 a.m. I think this is a better
effort, all it took was some reading-- but of course comments are still
eagerly awaited.

// Set the character, but check for surrogates
if (escapeChar <= 0xFFFF) {
  // Output directly
  readBuffer[i] = (char)escapeChar;
} else if (escapeChar <= 0x10FFFF) {
  escapeChar -= 0x10000;
  // Greater than 16 bits (max 20), need a surrogate
  // Output High Surrogate (add top 10 bits to 0xD800)
  readBuffer[i++] = ((char) (0xD800 | (escapeChar >> 10)));
  // Output Low Surrogate (add bottom 10 bits to 0xDC00)
  readBuffer[i] = ((char) (0xDC00 | (escapeChar & 0x03FF)));
} else {
  // The value is too large
  Error("Character reference is too large for UTF-16",
((int)escapeChar).ToString("X"), null);
}

All the best,
Jeff Rafter


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member