[Home] [By Thread] [By Date] [Recent Entries]


Elliotte Rusty Harold wrote:

> 
> One should keep in mind that Chinese and similar languages are quite 
> compressed to start with, far more so than English text is. For example, 
> in UTF-8 the English word "tree" takes four bytes. The Japanese word for 
> tree takes three bytes. 
 >

Good point, actually... I suppose that, in general, any language which 
uses more than 256 code points in general use is actually quite likely 
to be a language that uses one code point per word. So languages like 
Arabic, which are alphabet-based but not very compact in UTF-8 due to 
being composed of high-numbered characters (although I'm not sure how 
high so don't know if they would mainly be 2 or 3 bytes or whatever), 
would be better served by an encoding that mainly uses a shiftable 
window with single-byte characters, I guess.

ABS


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member