Subject: Re: size?
From: James Clark <jjc@xxxxxxxxxx>
Date: Fri, 14 May 1999 13:26:39 +0700
|
Kay Michael wrote:
>
> > -----Original Message-----
> > From: Steve Muench [mailto:SMUENCH@xxxxxxxxxxxxx]
> > It turns
> > out that the notion of the "length" of a string is
> > naturally and conveniently defined if you restrict
> > yourself to single-byte character sets, but for multibyte
> > character sets the notion of "length" is less well-defined.
>
> The number of characters in a string is perfectly well-defined in XML.
The XML spec says "At user option, processors may normalize such
characters to some canonical form." Normalization can change the number
of characters in a string (by composing or decomposing characters).
Another problem is with non-BMP characters (surrogate pairs). In XML
these are treated as a single character, but the DOM counts them as two
characters.
> It
> might not be exactly the definition that an expert in Ethiopian or
> Glagolitic might like, but it would be good enough for the rest of us.
It's more a matter of putting in a definition that speakers of many
non-English languages would find counter to their established cultural
conventions. Imagine a spec that counted the letters "i" and "j" as two
characters and every other English character as one character.
James
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
| Current Thread |
- Re: size?, (continued)
- Kay Michael - Thu, 13 May 1999 10:01:37 +0100
- Chris Maden - Thu, 13 May 1999 10:01:12 -0400 (EDT)
- James Clark - Fri, 14 May 1999 13:26:39 +0700 <=
|
|