Subject: Re: Character references, entities, XSL and cocoon
From: Elliotte Rusty Harold <elharo@xxxxxxxxxxxxxxx>
Date: Fri, 10 Sep 1999 07:52:33 -0400
|
>Cross -posted to xml-l. Please excuse the duplication.
>
>Hello colleagues,
>
>I'm creating an xml version of an art theory scholarly manuscript that
>includes ancient greek characters (with breathing marks, accents, etc.)
>I've run into some problems and would appreciate any help you could provide.
>I decided to use unicode character references for the ancient greek
>characters. With IE5 (newly equipped with the Athena font) the characters
>were successfully rendered on my screen using CSS (question 1 -- although
>they would not print! why?). However, I need to make this project
>accessible to a broader audience than IE5 users, so I've begun work with
>Cocoon, an Apache/Jserv servlet that will transform my XML into HTML using
>XSLT.
>
>Okay so far, but the character references in my xml document show up in the
>transformed HTML document as entity references, not rendered greek. (Some
>character references show up as question marks -- is this the parser or
>processor not able to recognize less common unicode characters?) Anyway,
>I'd very much appreciate help in understanding what's going on, and
>information about how I can pass my XML character references to the
>transformed HTML document.
>
I've encountered this myself. For instance see
http://metalab.unc.edu/xml/books/bible/errata/05.html for just another
example of the problem. The issue is that although HTML 4.0 defines many
entities like Ω for capital Greek omega, browsers generally don't yet
support these entity references. There's not a lot you can do about this in
the general case. For an occasional word or quotation, I just use the
references any way and hope that readers will understand. For a longer
passage, you can try using an output encoding like UTF-8 or 8859-7 that
actually includes the characters you want. Then you'd put a META tag in
your header like this:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-7">
Not all browsers will pick this up, or be able to display the write
character set even if they do recognize it; but some will.
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@xxxxxxxxxxxxxxx | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| The XML Bible (IDG Books, 1999) |
| http://metalab.unc.edu/xml/books/bible/ |
| http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://metalab.unc.edu/javafaq/ |
| Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/ |
+----------------------------------+---------------------------------+
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|