Subject: RE: Character entities in attribute values
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Wed, 23 Apr 2003 19:17:28 +0100
|
It looks like a simple explanation - you were using a product with a
serious bug in it.
Michael Kay
Software AG
home: Michael.H.Kay@xxxxxxxxxxxx
work: Michael.Kay@xxxxxxxxxxxxxx
> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of
> mark_fletcher@xxxxxxxxxxxxxx
> Sent: 23 April 2003 18:01
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: Character entities in attribute values
>
>
>
> Hi Mike (and others who have responded),
>
> First, I've found and fixed the problem. I'm using
> Arbortext's E3 product to do my processing and there was an
> instruction in their internal code to write out non-ASCII
> characters as numeric character references. So, that's how
> the accented unicode characters in the tag attributes became
> character references. Once I fixed that problem, the HTML
> output was fine, as there were no ampersands in any of the
> attribute values.
>
> However, it still sounds like you're all saying that even
> when a character reference does exist in an attribute value,
> I should not be seeing escaped ampersands when that attribute
> value is output as text. Well, if anyone's interested (and
> I'm not sure why you would be, at this point ;-) here's a
> sample of my previous input and output data and my xsl code
> that demonstrates the problem I was having:
>
> source xml tag:
>
> <xref linkend="i090f42a68009c2c9" book_code="cmkt"
> book_title="Guide Marketing du système GRC de
> PeopleSoft, version 8.8" chapter_title="Définition des
> entités de l'application Marketing de PeopleSoft"
> XREF_type="3" target_title="Définition des entités
> de l'application Marketing de PeopleSoft"
> chapter_type="Chapitre" file_name="cmkt03.htm"/>
>
> xsl template for this element:
>
> <xsl:template name="xref">
> <A
> HREF="../../{@book_code}/htm/{@file_name}#{@linkend}"><xsl:value-of
> select="@target_title"/></A>
> </xsl:template>
>
> html output:
>
> <A
> HREF="../../cmkt/htm/cmkt03.htm#i090f42a68009c2c9">D&#xe9;finition
> des entit&#xe9;s de l'application Marketing de PeopleSoft</A>
>
>
>
>
> Mark Fletcher
> PeopleSoft Language Engineering
> 925.694.3753
> mark_fletcher@xxxxxxxxxxxxxx
>
>
>
>
>
> "Mike Brown"
>
> <mike@xxxxxxxx> To:
> xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Sent by: cc:
>
> owner-xsl-list@xxxxxxxxxxx
> Subject: Re: Character entities in attribute values
> rrytech.com
>
>
>
>
>
> 04/23/2003 06:05 AM
>
> Please respond to xsl-list
>
>
>
>
>
>
>
>
>
>
> mark_fletcher@xxxxxxxxxxxxxx wrote:
> > the output text looks something like this: &eacute; instead of
> > this: é
>
> First please realize that when you output XML or HTML, the
> XSLT processor is (effectively, not necessarily) running a
> node tree through a serializer, and the serializer is what is
> escaping "&" and "<" and certain other characters appearing
> in places where they would otherwise be confused with markup.
>
> If you're getting &eacute; in the output, then you must
> have put the 8 characters "&" "e" "a" "c" "u" "t" "e" ";"
> into an attribute node (or text node, but you mentioned
> attribute) in your result tree, perhaps by copying this text
> from the source tree. Since you told the processor you wanted the
> *node* to contain those 8 characters, rather than 1 entity
> reference, it serialized the node in such a way that you'd
> get the characters when the output document is parsed. In
> other words, it preserved the semantics of the data, clearly
> distinguishing between character data and the structures
> implied by markup.
>
> Given that the XML parser feeding parsed data to the XSLT
> processor would have interpreted "é" in your original
> source document as a reference to the entity named acute,
> there's no way the 8 characters could have ended up in your
> source tree unless you did one of the following:
> - explicitly constructed that string in your stylesheet
> - copied text that was originally written like &eacute;
> - copied text that was originally written like <![CDATA[é]]>
>
> Both of the latter two mean exactly the same thing, and since
> the most common FAQ and misconception on this list (well, one
> of the most common) is the mistaken assumptions people make
> about what CDATA sections are, I'm going to guess that
> whoever made your XML decided to try to use it as a transport
> for entity-laden, non-well-formed HTML, saying that this data
> is just text, not markup. Then you tried to use XSLT to copy
> it through, and were surprised to see that you can't use XSLT
> to pretend character data is actually markup.
>
> However, as others have mentioned, this is just a wild guess.
> Explain more about what you're doing, with sample code (brief).
>
>
> XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
>
>
>
>
>
>
>
>
>
> XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
>
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|