Subject: RE: XSLT 2.0 : Unicode hex notation in regular expressions
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Thu, 12 Aug 2004 12:12:08 +0100
|
The notation \u1234 is not supported in XPath 2.0 regular expressions. Use
ሴ instead.
Michael Kay
> -----Original Message-----
> From: Pierrick Brihaye [mailto:pierrick.brihaye@xxxxxxxxxx]
> Sent: 12 August 2004 10:38
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: XSLT 2.0 : Unicode hex notation in regular expressions
>
> Hi,
>
> I don't know if my XSLT syntax is wrong or if it is a Saxon-related
> problem. Let's blame the XSLT writer rather than the XSLT processor
> first ;-)
>
> Given the following XML :
>
> <?xml version="1.0" encoding="UTF-8"?>
> <text>livre : ????</text>
>
> And the following XSLT :
>
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet version="2.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
> <xsl:template match="/text">
> <xsl:comment><xsl:value-of
> select="system-property('xsl:vendor')"
> /></xsl:comment>
> <words>
> <xsl:for-each select="tokenize(.,'\s+')">
> <word>
> <xsl:attribute name="language">
> <xsl:choose>
> <xsl:when test="matches(.,'[a-z]+')">latin</xsl:when>
> <xsl:when
> test="matches(.,'[\\u0600-\\u06FF]+')">arabic</xsl:when>
> <xsl:otherwise>whatever</xsl:otherwise>
> </xsl:choose>
> </xsl:attribute>
> <xsl:attribute name="codepoints"><xsl:value-of
> select="string-to-codepoints(.)"/></xsl:attribute>
> <xsl:value-of select="."/>
> </word>
> </xsl:for-each>
> </words>
> </xsl:template>
> </xsl:stylesheet>
>
> I get :
>
> <?xml version="1.0" encoding="UTF-8"?>
> <!--SAXON 8.0 from Saxonica-->
> <words>
> <word language="latin" codepoints="108 105 118 114
> 101">livre</word>
> <word language="arabic" codepoints="58">:</word>
> <word language="whatever" codepoints="1603 1578 1575
> 1576">????</word>
> </words>
>
> Why this curious match for codepoint 58 ? And why no match for the
> arabic characters ?
>
> BTW, I first tried : matches(.,'[\u0600-\u06FF]+') as stated by
> http://www.unicode.org/reports/tr18/#Hex_notation
>
> But Saxon returned the following error :
>
> Error at xsl:when on line 11 of file:/C:/...:
> net.sf.saxon.type.RegexTranslator$RegexSyntaxException: Error at
> character 2 in regular expression: bad escape sequence
>
> That's why I doubled the "\" character. Is this doubling
> spec-compliant ?
>
> Cheers,
>
> p.b.
| Current Thread |
|
Michael Kay - 12 Aug 2004 11:12:57 -0000 <=
|
|