Subject: Re: Special entity characters in Shift-JIS XSL.
From: Tony Graham <tgraham@xxxxxxxxxxxxxxxx>
Date: Wed, 15 Dec 1999 13:05:11 -0400 (EST)
|
At 15 Dec 1999 08:55 -0500, Douglas Weed wrote:
> An application has been developed which uses the Microsoft MXSML parser
> enclosed in a DLL to apply XSL files against an XML stream. The encoding is
> in Shift-JIS as the application is double byte. The net result of the
> application is HTML. The target browser has been developed to understand
> certain 'special characters' or entities, which in themselves are double
> byte. Much in the same way ' maps to an asterisk. For example
> ù† would yield a special 2 byte character which is a Q surrounded
> by a circle. If this character sequence is placed directly into a .htm
> page, it works. However, as I suspected, when placed within an xsl file and
> transformed with the xml, it yields nothing since the parser tries format
> it. I attempted to use an in-line DTD to define the entity and use the
> definition within the XML file, however, MSXML has some real difficulties
> handling an in-line DTD when the XML is a character string and not a file.
> The work-arounds specified by MS are not feasible. The question : does
> another technique exist to have the XSL file ignore ù† and pass it
> straight through to the HTML stream? Sorry for the length of the message
> and thanks for any responses.
In XML, numeric character references are always to Unicode code
values. A conforming application should recognise ù&134; as
LATIN SMALL LETTER O WITH STROKE followed by one of the C1 control
characters.
What comes out of your MSXML DLL almost certainly uses two bytes to
represent each character -- UTF-16 uses two bytes per character, and
UTF-8 also uses two bytes per character for character numbers in that
range.
Relying on two numeric character references to represent a double-byte
sequence is fragile, as you have found.
The numeric character reference for the Unicode character CIRCLED
LATIN CAPITAL LETTER Q is Ⓠ.
I don't know that MSXML allows you to specify the output encoding.
However, if I'm correct in thinking that a circled Q is gaiji in
Shift-JIS, the character might be dropped in a conversion to Shift-JIS
anyway.
Regards,
Tony Graham
======================================================================
Tony Graham mailto:tgraham@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9632
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|