[Home] [By Thread] [By Date] [Recent Entries]

  • From: "Michael Kay" <mike@s...>
  • To: "'Costello, Roger L.'" <costello@m...>,<xml-dev@l...>
  • Date: Fri, 28 Sep 2007 16:22:21 +0100

Hex editors show you what they've got in memory, not what's on the disk. So
this tells you that the editor has converted the data to iso-8859-1 or
something similar for processing in memory.

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Costello, Roger L. [mailto:costello@m...] 
> Sent: 28 September 2007 16:13
> To: xml-dev@l...
> Subject:  UTF-8 Question: e with acute accent should 
> require two bytes, right?
> 
> Hi Folks,
>  
> Consider this element:
>  
> <title>My Resumé</title>
> 
> Notice: é (the character "e" with an acute accent). It is U-00E9
> 
> Since its code point is greater than U+0080, it requires more 
> than one byte. 
> 
> Hex E9 = Decimal 233.  This has the binary: 11101001
> 
> I believe that it is encoded in UTF-8 as two bytes:
> 
>   11000011 10101001
> 
> These bytes correspond to hex C3 and hex A9.
> 
> Thus, é should be encoded in UTF-8 as:
> 
>   C3A9
> 
> The code points of the other characters (My Resum) are all 
> less than U-0080, and so the UTF-8 encoding of those 
> characters should be only one byte.
> 
> So, this is what I believe should be the bytes:
> 
>  M y    R  e s  u m   é
> 4D79 2052 6573 756D C3A9
> 
> Do you agree?
> 
> However, when I view the bytes in my hex editor I get this:
> 
>  M y    R  e s  u m  é
> 4D79 2052 6573 756D E9
> 
> Notice that é uses only one byte.
> 
> Something is wrong.  Here's what I think may be wrong:
> - the editor that I am using to display the hex values is 
> displaying the code points and not the hex values. However, I 
> have now tried two editors, and they both display the same 
> thing (E9).  So perhaps the editor isn't the problem.  
> Perhaps I'm the problem, and am misunderstanding something.  Help!
> 
> /Roger
> 
> 
> ______________________________________________________________
> _________
> 
> XML-DEV is a publicly archived, unmoderated list hosted by 
> OASIS to support XML implementation and development. To 
> minimize spam in the archives, you must subscribe before posting.
> 
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@l...
> subscribe: xml-dev-subscribe@l... List archive: 
> http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
> 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member