Re: Specifying a Unicode subset

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

To: veillard@r...
Subject: Re: Specifying a Unicode subset
From: Paul Prescod <paul@p...>
Date: Tue, 22 Oct 2002 16:36:47 -0700
Cc: xml-dev@l...
References: <AF104122-E511-11D6-BFB3-0030657E2F34@m...> <200210211640.MAA28778@m...> <20021022173710.E12115@r...>
User-agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.1) Gecko/20020826

Daniel Veillard wrote:
> ...
> 
>   And using UCS-2 for memory encoding is also in a lot of cases
> a really bad choice. Processor performances are cache related nowadays.
> Filling them up with 0 for half of your data processed can simply
> trash your caches. I will stick to UTF8 internally, it also allows
> some processor to use hardcoded CISC instructions for 0 terminated C
> strings (IIRC the Power line of processors have such a set of instructions).

The costs and benefits of UTF-8 are well-known. Random-access at the 
character level becomes quite inefficient. Neither UCS-2 nor UTF-8 are 
right as the in-memory model for all applications.

  Paul Prescod

Follow-Ups:
- Re: Specifying a Unicode subset
  - From: Tim Bray <tbray@t...>

References:
- Re: Specifying a Unicode subset
  - From: tblanchard@m...
- Re: Specifying a Unicode subset
  - From: John Cowan <jcowan@r...>
- Re: Specifying a Unicode subset
  - From: Daniel Veillard <veillard@r...>

Prev by Date: XML 1.1 Names
Next by Date: XML 1.1 documents
Previous by thread: Re: Specifying a Unicode subset
Next by thread: Re: Specifying a Unicode subset
Index(es):
- Date
- Thread

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >