Subject: RE: unicode numeric character references in xml output
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 30 May 2008 22:21:09 +0100
|
You can't distinguish characters that were written in the source document as
themselves from characters that were written as numeric character references
- the XML parser doesn't provide this information.
You can force all multi-byte characters to be output as character references
by specifying <xsl:output encoding="iso-8859-1"/>
I tend to think there's something a bit wrong with your system design if it
depends on getting this right. It shouldn't matter how characters are
represented, any more than in matters whether the input is on a local disk
or on the web - you need to get your architectural layering right.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: a k laue [mailto:quiotl@xxxxxxxxx]
> Sent: 30 May 2008 18:28
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: unicode numeric character references in xml output
>
> Hello,
>
> I'm transforming XML to XML, and I need to pass through the
> unicode numeric character references (hex) from the source to
> the output. That is, I need "’" in the input to appear
> as "’" in the output. I'm using XSLT 2.0 and the Saxon
> 9B processor.
>
> Unfortunately, the set of possible character references is
> large. (The transform works on a very large set of scientific
> articles. These may include special characters in author
> names, article titles,
> etc.) I originally looked to character maps as the solution,
> but I don't see how to map the entire unicode set.
>
> Thanks,
> Andrea
|