[Home] [By Thread] [By Date] [Recent Entries]

  • From: Tim Bray <tbray@t...>
  • To: David Brownell <david-b@p...>, xml-dev@l...
  • Date: Tue, 24 Jul 2001 23:13:14 -0700

At 10:11 AM 24/07/01 -0700, David Brownell wrote:
>Tim Bray responding to a question of mine:
>> Seems like the smart thing is to leave it in a String
>> for now, in the hope that the rest of the Java apparatus
>> will get non-BMP-savvy in the course of time, and you'll
>> be able to send these things to renderers and other
>> string-processing-functions and the Right Thing Will
>> Happen.
>
>Except that changes the way programs working with
>individual "character" values will work:  they'll have
>to convert to array-ish (string, char[]) representations.
>
>Plus, learn that some characters consume multiple
>indices ... that starts to touch on display issues, like
>combining characters, with similar problems.

Ouch, it's worse than I thought.  One of the "nice" things
about the UTF16 surrogate system is that if you don't have
the apparatus around to deal with astral-plane chars, you
can just obliviously treat 'em as pairs of characters you 
don't know.

But XML carefully rules out that possibility, prod [2] 
for "Char" rules excludes surrogate blocks. In retrospect, 
maybe that was dumb?

Which means in effect that Dave's right, basically you just
totally can't use a java's String or char in dealing with
Blueberry docs.  Or am I missing something... please?  Or
re-open the door to the UTF-16 hack by putting the 
surrogate blocks back into [2] as part of the Blueberry
update.

Er, is anyone in the Java language team on top of what
Unicode's up to?  This is a real problem.

Somebody ship some Prozac over to Elliote before he goes
critical... -Tim


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member