Subject: Re: Special entity characters in Shift-JIS XSL.
From: David Carlisle <davidc@xxxxxxxxx>
Date: Fri, 17 Dec 1999 10:05:07 GMT
|
> I think the OPPOSITE of flaky is the word I would use to describe an entity
> identification paradigm that allows the entity to remain in its encoded
> form, yet still be identified as an entity. I think solid is more the word.
You could build a solid system on that basis, but it wouldn't be XML.
> how can it then be passed to anymore parsers expecting 7-bit ASCII
> characters?
XML character set is _always_ unicode. If the encoding isn't the default
utf8 or utf16 not all of the character set may be directly accessed by
character data, but you can always use the &# syntax to access any
unicode character. An XML parser _has_ to treat `A' and `A' in an
identical manner and report `character number 65' to the application,
whichever version was in the input file. If your application _needs_
to see `A' and not `A' then it isn't an XML application (it could be
an SGML one).
> What if each of those parsers followed the spec, the first
> transforming the character into a 2-byte unicode character, leaving the
> others to see the two bytes as simply two different characters in the
> stream?
This can't happen as in a well formed XML document you _always_ know
if a multi-byte encoding is being used. Eitehr the <?xml declaration
specifies a single byte encoding such as latin 1, or a multiple byte
encoding is being used (utf 8 unless the first two bytes of the file are
the BOM, in which case it's utf-16)
David
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|