[Home] [By Thread] [By Date] [Recent Entries]

  • From: Andrew Welch <andrew.j.welch@g...>
  • To: David Carlisle <davidc@n...>
  • Date: Fri, 10 Dec 2010 10:43:32 +0000

On 10 December 2010 09:28, David Carlisle <davidc@n...> wrote:
> On 10/12/2010 08:56, Stephen Green wrote:
>>
>> Does newXML being treatable as a string mean the *UTF-8 default*
>> requirement
>> is better relaxed in some way? I mean, a developer writing a string
>> doesn't want
>> to have to ensure it is all written in UTF-8 do they?
>
> why would any person ever have to know what the utf8 encoding is? If you
> want an "a" then you can enter an a without knowing what the latin1 or ascii
> or utf8 encodings of an a are. They happen to all be the same in that case.
> If you pick another letter such as pound sign, or e acute they happen to be
> different, but since typically a human doesn't know any of the numbers it
> doesn't make any difference, it's just a matter of what your text editor
> does when you hit save.

Yep - the "UTF-8/16 only" suggestion is to solve the problem of the
potential mismatch between the encoding in the prolog and the actual
encoding.. add to that the content-type when http is involved and you
have 3 areas to look at to determine the encoding...

This manifests itself as the common problem of "funny characters" in
the output, where UTF-8 has been parsed as windows 1252 or latin 1.
Or vice-versa where you get the "invalid byte sequence" error message.

One common cause of this is simply someone editing the xml file in a
text editor such as notepad... someone updates a value in a config
file and bang, the xml won't parse any more.

Making it UTF-8/16 only fixes the widespread "funny characters"
problem by always parsing in UTF-8/16, and on the flip side can
replace the obscure "Invalid byte sequence.." error message with "This
document is not UTF-8/16, please fix this by blah blah blah" or some
other more helpful message.

It also fixes the 3-way xml-over-http whats-the-encoding fun...

It also makes removing the prolog easier, and should allow a better
error message when parsing an empty file etc.


-- 
Andrew Welch
http://andrewjwelch.com


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member