[Home] [By Thread] [By Date] [Recent Entries]

  • From: Bill Kearney <wkearney99@h...>
  • To: xml-dev@l...
  • Date: Sat, 29 Sep 2007 14:42:40 -0400

Alessandro Triglia wrote:
> It is not correct to say that a Unicode character can be either an "ASCII
> character" or a "non-ASCII character".  It is better to say that some
> Unicode characters (those with codes below 128) have a corresponding
> character in ASCII.
>   
Who said anything about ASCII?  That just muddies up the water. 

The representation of that character as E9 presumably comes from the 
editor in question basing itself on ISO-8859-x (but only in SOME of 
them).  Not ASCII.

It's not uncommon for text editors to get this wrong, or make 
assumptions about the encoding based on several other factors.  If your 
underlying OS is 'misconfigured' it can get even more confusing.  The 
tools start trying to "help you" by translating things.  This is nearly 
never helpful for developers trying to wrestle with encoding.  For the 
average wage slave just trying to cut-and-paste between different 
applications it's usually not (too much) of a problem.

And to throw another monkey into the wrench, when you use numeric 
entities in XML they're ALWAYS indicated using ISO 10646 regardless of 
the document's declarations.  Thus even in an ISO-8859-1 XML document 
you would not use &#xE9; for it, you'd have to use &#xC3A9;   \

Encoding, it's turtles ALL the way down.

But none of this really has anything to do with "ASCII" so just ditch 
that nonsense.

-Bill Kearney
Syndic8.com


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member