Subject: RE: Using Entity References in XSL Templates
From: Arjun Ray <aray@xxxxxx>
Date: Mon, 24 Jan 2000 06:28:31 -0500 (EST)
|
On Fri, 21 Jan 2000, Mike Brown wrote:
> HTML 4 uses numeric entities ...
Please note that _character reference_ and _entity reference_ are distinct
categories. In particular, the former is not an "entity" at all.
> ... to refer exclusively to code positions in the document's character
> set, while named entities refer to character positions in either
> ISO-8859-1 or UCS, depending on which entity you're referring to.
Not quite. In the entity declarations, the entities are defined in terms
of character references. That's OK, because the connection between
ISO-8859-1/UCS and the document character set is determined by the SGML
declaration.
> In HTML, is always ISO-8859-1 character number 160, i.e. a
> non-breaking space ... but   is simply character number 160 in the
> character set of the document encoding,
Oops. Not at all. Absolutely not. The encoding has absolutely nothing,
repeat **NOTHING** to do with this. The I18n spec is required reading:
http://www.ietf.org/rfc/rfc2070.txt
(It's the job of the "entity manager" to transcode from the encoding to
the document character set.)
These references may be helpful also:
http://www.mulberrytech.com/papers/docchar.htm
http://www.hut.fi/u/jkorpela/chars.html
http://candl.let.ruu.nl/Archive/cts/html/scharacterset.htm
Arjun
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|