[Home] [By Thread] [By Date] [Recent Entries]

  • From: Michael Kay <mike@s...>
  • To: xml-dev@l...
  • Date: Fri, 01 Mar 2013 18:08:24 +0000



I've been advising people how to solve character encoding issues for about 100 years, but our own internal system for handling Saxon license requests still gets it wrong. It ain't easy.

> For what it's worth, 1: Joel Spolsky's article on "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" <http://www.joelonsoftware.com/articles/Unicode.html> is quite good, I think.

The thread seems to be pointing to two conclusions:

(a) there are people who don't understand the theory, and need to be 
educated (I don't know if Roger's insight about &x80 really was a new 
discovery for him, if so I am rather shocked).

(b) but even if you do understand the theory, it's still hard to get it 
right in practice, because our systems are complex and built from 
heterogeneous components, many of which are outside our control, cannot 
be easily changed, and are poorly documented; the more complex they 
become, the more opportunities there are for data to be corrupted across 
the component boundaries.

The underlying problem is that components throw bytes at each other 
without first agreeing what they mean, and because it works most of the 
time (i.e. when you speak English) people live with the problem rather 
than fixing it; and because they don't fix it, it gets worse.

Michael Kay
Saxonica




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member