[Home] [By Thread] [By Date] [Recent Entries]

  • From: "Waters, Michael, Springer US" <Mike.Waters@s...>
  • To: "Costello, Roger L." <costello@m...>, xml-dev@l...
  • Date: Fri, 28 Sep 2007 12:51:56 -0400

> Notice: é (the character "e" with an acute accent). It is U-00E9
> 
> Since its code point is greater than U+0080, it requires more than one
> byte. 

It depends. In ISO 8859-1 (Latin-1) and Windows-1252 (the default for many editors), only 1 byte is required: 0xE9.

> Thus, é should be encoded in UTF-8 as:
> 
>   C3A9

Yes.

> Something is wrong.  Here's what I think may be wrong:
> - the editor that I am using to display the hex values is displaying
> the code points and not the hex values. However, I have now tried two
> editors, and they both display the same thing (E9).

PSPad has 2 methods to invoke a hex view of a file, giving somewhat different results:

1. Open the file in the default Text Editor mode, then switch to View/Hex Edit Mode. Here, encoding conversions are coming into play, when switching views of the "bytes in memory."

2. Open the file directly in the Hex Editor, by selecting File/Open in Hex Editor. In this mode you get a better view of the "bytes on disk" without encoding conversions. When I come across encoding problems, this is the view that I use.

Perhaps the editors you've tried don't have the second type of hex view, which I think is what you want.

Mike Waters


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member