[Home] [By Thread] [By Date] [Recent Entries]

  • From: Tim Bray <tbray@t...>
  • To: David Brownell <david-b@p...>, xml-dev@l...
  • Date: Mon, 23 Jul 2001 12:07:51 -0700

At 11:17 AM 23/07/01 -0700, David Brownell wrote:
>I'm curious ... seems one of the API costs of converting
>systems to Unicode 3.1 support is getting real support
>for surrogate pairs.  I may have missed something, but
>last I heard there was no such support, even in JDK 1.4:
>
>http://java.sun.com/j2se/1.4/docs/api/java/lang/Character.html

Blecch; so in fact Java's "char" really represents
the late-unlamented UCS-2 encoding.

>Do folk prefer to deal with characters in the astral planes
>as surrogate pairs (native representation in String and in
>char arrays), or as decoded "int" values?  Both?  Is some
>other representation preferred?

Seems like the smart thing is to leave it in a String
for now, in the hope that the rest of the Java apparatus
will get non-BMP-savvy in the course of time, and you'll
be able to send these things to renderers and other
string-processing-functions and the Right Thing Will
Happen.

I'm wondering if there is a need for some Blueberry-aware
SAX2 utility/support interfaces, but nothing comes to mind.

-Tim


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member