Subject: RE: double escaping problem [re-visited]
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 13 Nov 2007 09:14:21 -0000
|
> Hmmm. I was afraid of that. I am still baffled as to how to
> go about telling my stylesheet that the input it gets from a
> particular source tree by way of the document() function that
> it will have already been escaped and therefore that '&'
> need not be escaped again (making it '&amp;').
>
The document() function invokes an XML parser and it can only do what an XML
parser does.
In fact an XML parser removes one level of escaping, and a serializer adds
it back. So the parser turns "&" into "&" and "&amp"; into "&",
and the serializer turns them back into "&" and "&amp;"
respectively, unless d-o-e is set, in which case they are turned into "&"
and "&" respectively. All the evidence is that your XML source as read
by the parser was actually double-escaped. This quite often happens when you
have fragments of XML stored in a database: if you try to extract it as XML,
and the database software doesn't realise that it's already in XML format,
then the database software adds a level of escaping that you don't want. The
way to get rid of it is to change the way you do the database query.
Michael Kay
|