[Home] [By Thread] [By Date] [Recent Entries]

  • From: Michael Sokolov <msokolov@s...>
  • To: Michael Kay <mike@s...>
  • Date: Fri, 01 Mar 2013 07:49:22 -0500

On 3/1/2013 6:36 AM, Michael Kay wrote:
>
> On 01/03/2013 06:30, David Lee wrote:
>> Curious .. Is this a common misconception ?
>> How prevenant is the confusion between xml encoding and the infoset 
>> or XDM character model of Unicode codepoints?  Encoding charset != 
>> Unicode codepoints.  Simple!!!!!???
>>
>> I hinted at this months ago on this list that I believe the level of 
>> misunderstanding of encoding and Unicode concepts is both high and 
>> not self recognized.  Which is a deadly combination.
>> Is there more "the community" can do to make it clearer?
>>
>
> If there is, please let me know.
>
> I've been advising people how to solve character encoding issues for 
> about 100 years, but our own internal system for handling Saxon 
> license requests still gets it wrong. It ain't easy.
The advice I always give is: use (and demand) UTF-8 everywhere and 
anywhere that you can.  Don't use named entities ever (actually this has 
nothing to do with character sets, but it's still my position :)).  Use 
numerical entities only when it is absolutely necessary. Remember that 
if you use multiple character sets (or accept data from outside that may 
be in unknown or ill-defined encodings), you may have complicated 
problems arise in almost any layer of your software stack. Problems 
still come up (we have an entire category of bugs in one customer's 
system related to umlauts), but demanding utf-8 only from data suppliers 
has helped to avoid at least some character set translation issues.

-Mike


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member