Re: Quiz: How do you put a Euro sign in your data if yourXML u

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

From: Michael Kay <mike@s...>
To: xml-dev@l...
Date: Fri, 01 Mar 2013 18:08:24 +0000



I've been advising people how to solve character encoding issues for about 100 years, but our own internal system for handling Saxon license requests still gets it wrong. It ain't easy.

> For what it's worth, 1: Joel Spolsky's article on "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" <http://www.joelonsoftware.com/articles/Unicode.html> is quite good, I think.

The thread seems to be pointing to two conclusions:

(a) there are people who don't understand the theory, and need to be 
educated (I don't know if Roger's insight about &x80 really was a new 
discovery for him, if so I am rather shocked).

(b) but even if you do understand the theory, it's still hard to get it 
right in practice, because our systems are complex and built from 
heterogeneous components, many of which are outside our control, cannot 
be easily changed, and are poorly documented; the more complex they 
become, the more opportunities there are for data to be corrupted across 
the component boundaries.

The underlying problem is that components throw bytes at each other 
without first agreeing what they mean, and because it works most of the 
time (i.e. when you speak English) people live with the problem rather 
than fixing it; and because they don't fix it, it gets worse.

Michael Kay
Saxonica

References:
- Re: Quiz: How do you put a Euro sign in your data if yourXML uses windows-1252 encoding and you use a numeric character reference?
  - From: David Lee <dlee@c...>
- Re: Quiz: How do you put a Euro sign in your data if yourXML uses windows-1252 encoding and you use a numeric character reference?
  - From: Michael Kay <mike@s...>
- Re: Quiz: How do you put a Euro sign in your data if your XML uses windows-1252 encoding and you use a numeric character reference?
  - From: Norman Gray <norman@a...>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >