Re: UTF-8 Question: e with acute accent should requiretwo byte

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

From: Julian Reschke <julian.reschke@g...>
To: Jonathan Robie <jonathan.robie@r...>
Date: Fri, 28 Sep 2007 20:35:02 +0200

Jonathan Robie wrote:
> Hi Roger,
> 
> UTF-8 uses an 8 bit encoding. E9 fits in 8 bits. It doesn't fit in 7, 
> but there's no such thing as UTF-7, the problem you refer to is an ASCII 
> 7-bit problem. Since 8 bits represents twice as many characters as 7 
> bits, it's enough to represent most European languages using one byte 
> per character.
> 
> Jonathan

Ahem, this is either incorrect or at least expressed in a confusing way.

UTF-8 uses sequences of bytes (of 8 bits). As UTF-8 can encode all 
Unicode code points, most of them -- all characters with code points >= 
128 -- need two or more bytes.

So no, although E9 fits into 8 bits, it's UTF-8 encoding requires more 
than one byte.

BR, Julian

References:
- UTF-8 Question: e with acute accent should require two bytes, right?
  - From: "Costello, Roger L." <costello@m...>
- Re: UTF-8 Question: e with acute accent should requiretwo bytes, right?
  - From: Jonathan Robie <jonathan.robie@r...>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >