Subject: Re: Binary characters in XML
From: "Dimitre Novatchev" <dnovatchev@xxxxxxxxx>
Date: Sun, 29 Jun 2003 09:16:41 +0200
|
"Michael Leung" <mmhleung@xxxxxxxxxxxxx> wrote in message
news:20030629043817.55987.qmail@xxxxxxxxxxx
> Hi,
>
> I am trying to transform an XML document:
>
> <?xml version="1.0" encoding="utf-8"?>
> <?xml-stylesheet type="text/xsl" href="bin.xsl"?>
> <doc>
> <binary>
> �
> </binary>
> </doc>
>
> into a binary file with the contents being the value of
> the binary element in the above using the following
> XSLT stylesheet:
>
> <?xml version="1.0"?>
> <xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
> <xsl:output encoding="utf-8" />
>
> <xsl:template match="binary">
> <xsl:value-of select="."/>
> </xsl:template>
> </xsl:stylesheet>
>
> I tried using MSXSL from Microsoft and I got an "Invalid unicode
character" error.
>
> I also tried using Saxon and I got an "illegal XML character " error.
> In IE, those characters are displayed as rectangles ().
>
> I wonder why the XSLT processors are complaining about these
> characters and I wonder if it is possible to carry out such a
> transformation.
It is not the XSLT processors that are complaining -- it is the XML parsers.
The XML 1.0 Spec defines strictly what characters are allowed in an XML
document:
"Character Range[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |
[#xE000-#xFFFD] |
[#x10000-#x10FFFF] /* any Unicode character, excluding the
surrogate blocks, FFFE, and FFFF. */"
http://www.w3.org/TR/REC-xml#charsets
Therefore, the only characters with code less than #x20 are: #x9 | #xA | #xD
The cited xml parsers are implementing this spec and are correctly producing
error messages for any character not included in the above definition -- and
this is exactly the case you describe.
=====
Cheers,
Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
| Current Thread |
- Binary characters in XML
- Michael Leung - Sun, 29 Jun 2003 00:39:03 -0400 (EDT)
- Dimitre Novatchev - Sun, 29 Jun 2003 03:01:39 -0400 (EDT) <=
- Michael Kay - Sun, 29 Jun 2003 08:15:59 -0400 (EDT)
- <Possible follow-ups>
- Neil Smith - Mon, 30 Jun 2003 05:52:56 -0400 (EDT)
- dsk - Mon, 30 Jun 2003 07:06:17 -0400 (EDT)
|
|