[Home] [By Thread] [By Date] [Recent Entries]


Joe English scripsit:

> The 'codePoint' typedef may be problematic:
> 
>     // Unicode code points (4-byte int on most systems)
>     typedef wchar_t codePoint;
> 
> The C standard makes no useful guarantees about
> the size or interpretation of 'wchar_t'.  On some
> systems it's identical to plain 'char', and even
> on systems where it's big enough to hold all of
> Unicode, there's no guarantee about what encoding
> the wcs* and *wcs functions use.  wchar_t should
> not be used in programs that are meant to generate 
> portable data and be portable themselves; you just 
> don't know what you're going to get.

I have argued privately that wchar_t is in fact the Right Thing here
despite its variability in size (UTF-32 on Unix platforms, UTF-16 on
Windows), because it makes genx compatible with both standardized and
non-standardized facilities, most especially "..."L strings.  Some
conditional logic will be needed to interpret the input as UTF-16 or
UTF-32, which can be based on sizeof(wchar_t).  Hypothetical platforms
where sizeof(wchar_t) == 1 can be neglected.

-- 
He made the Legislature meet at one-horse       John Cowan
tank-towns out in the alfalfa belt, so that     jcowan@r...
hardly nobody could get there and most of       http://www.reutershealth.com
the leaders would stay home and let him go      http://www.ccil.org/~cowan
to work and do things as he pleased.    --Mencken, _Declaration of Independence_

  • Follow-Ups:
  • References:
    • Genx
      • From: Tim Bray <tbray@t...>
    • Re: Genx
      • From: Joe English <jenglish@f...>
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member