Subject: RE: recognize character entities
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 29 Aug 2006 15:00:16 +0100
|
> is there a way to recognize and filter only elements which
> text() begins with a character entity?
>
> <Element>&Amp;This is filtered</Element>
> <Element>εThis filtered</Element>
> <Element>€This is filtered</Element>
> <Element>This is *not* filtered</Element>
> <Element>And this also not</Element>
Technically these are all "entity references", not "character entities". If
you wrote "€", that would be a "character reference".
You can't detect these in XSLT, because the XML parser expands the character
entity before the XSLT processor gets to see it. If you really need to
distinguish a Euro sign written as € from one written as a real Euro
character (from one written as €, if that's the right code), then you
need to preprocess the XML to flag these so they survive the journey through
the XML parser. For example, you could use a Perl script that replaces
€ by <?ent euro?>.
But this is against the spirit of XML: the entity reference is supposed to
be treated by the receiving application in exactly the same way as its
expansion would be treated.
Michael Kay
http://www.saxonica.com/
>
> in a template match like
>
> <xsl:template match="Element[starts-with(text(),
> 'recognize_a_character_entity_here')]">
> <NewElement>
> <xsl:apply-templates select="@* | node()"/>
> </NewElement>
> </xsl:template>
>
> for mathematical xml-files we have a lot (around 2000)
> character entities to recognize and no chance to select them
> individually.
>
> thanks in advance
> for any help
>
> frank
>
> (thanks to mukul, abel and michael for answering my
> apostrophe/ quotation mark question)
> --
> Frank Marent
> CTO
>
> emnemics ag
> Jungholzstrasse 43
> CH-8050 Z|rich
>
> Tel +41 44 307 32 71
> Fax +41 44 307 32 75
> Mail frank.marent@xxxxxxxxxxx
> Skype frank.marent
> URL www.emnemics.ch
>
> Ein Unternehmen der Kalaidos Bildungsgruppe Schweiz
|