Subject: RE: encoding woes: ISO-8859-1 vs. UTF-8
From: "Michael Kay" <michael.h.kay@xxxxxxxxxxxx>
Date: Wed, 24 Jul 2002 09:05:31 +0100
|
> > ISO-8859-1 can only encode the characters in the
> > range 0-255.
>
> That's what I thought as well. How did saxon
> converted those two control chars into the proper
> encoding for “ and ” even though the input
> XML was marked as encoding in ISO-8859-1? I was fully
> expecting the import would fail, but somehow it was successful.
I have no idea. This isn't done by Saxon, it's done by the XML parser.
If you were using the default parser (AElfred), I think that it actually
accepts bytes x80-x9F with encoding="iso-8859-1", converting them into
characters x80-x9F.
>
> Good point. For export output, I changed encoding to
> UTF-8, that seems to have resolved the problem, now
> export is successful. Open the exported CSV in Hex
> editor, those two chars are shown as Hex 93/94,
> respectively.
>
Now I really am puzzled.
Michael Kay
Software AG
home: Michael.H.Kay@xxxxxxxxxxxx
work: Michael.Kay@xxxxxxxxxxxxxx
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|