Subject: RE: xsl:analyze-string problem
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 8 Feb 2007 17:00:55 -0000
|
I would tackle this as follows:
Step 1: classify the element. Use xsl:choose and matches() to decide which
of the four categories it belongs to, and copy the element adding an
attribute to indicate the category.
Step 2: do the grouping (concatenation of adjacent elements according to
your rule C). Probably using xsl:for-each-group group-adjacent, but I'm not
entirely clear of the criteria.
Step 3: use analyze-string on the contents of the grouped elements to insert
<ordinal> and <text> element children.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Yves Forkl [mailto:Y.Forkl@xxxxxx]
> Sent: 08 February 2007 16:48
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: xsl:analyze-string problem
>
> Hi XSLT 2.0 wizards,
>
> while the syntax and semantics of xsl:analyze-string have
> become clear to me, I am now in search of an idiom implying
> it which it could help me solve this problem. (Or maybe of an
> alternative...)
>
> In the input I find elements like these:
>
> 1) <e> def ghi</e>
> 2) <e> abc 22 def 3 ghi 1. </e>
> 3) <e> 2. </e>
> 4) <e> 3. def 35 78 ghi </e>
>
> The possible contents fit into exactly 4 classes:
>
> 1) just some words and/or numbers
> 2) like 1), but followed by a number and a period
> 3) just a number and a period
> 4) like 3), but followed by some words and/or numbers
>
> In each case, spaces may or may not appear at beginning and
> end of the content and must be preserved (no matter to which
> group they get attached).
>
> The problem consists of replacing the original "e" element by
> creating new elements according to these rules:
>
> A) A number followed by a period goes into a "ordinal" element.
> B) Words and numbers go into a "text" element.
> C) In cases 1) and 4), where words and numbers appear at the
> end, the content of the current "e" element must be
> concatenated with all adjacent "e" elements of type 1) and 2)
> before putting it all into the "text" element. By contrast,
> in cases 2) and 3) which are ended by a number and a period
> the contents of the following "e" instance should never be appended.
>
> My approach is to use the following templates:
>
> <xsl:template match="e">
>
> <xsl:analyze-string select="." regex="^(.*?)( *[0-9]\. *)(.*)$">
>
> <xsl:for-each select="regex-group(1)">
> <xsl:call-template name="create_element_and_space">
> <xsl:with-param name="new_element_name" select="'text'"/>
> </xsl:call-template>
> </xsl:for-each>
>
> <xsl:for-each select="regex-group(2)">
> <xsl:call-template name="create_element_and_space">
> <xsl:with-param name="new_element_name"
> select="'ordinal'"/>
> </xsl:call-template>
> </xsl:for-each>
>
> <xsl:for-each select="regex-group(3)">
> <xsl:call-template name="create_element_and_space">
> <xsl:with-param name="new_element_name" select="'text'"/>
> </xsl:call-template>
> </xsl:for-each>
>
> </xsl:matching-substring>
>
> </xsl:analyze-string>
>
> <xsl:apply-templates select="following-sibling::e[1]"/>
>
> </xsl:template>
>
>
> <!-- helper template for squeezing spaces out into mixed
> content --> <xsl:template name="create_element_and_space">
> <xsl:param name="new_element_name"/>
>
> <xsl:analyze-string select="." regex="^\s+|\s+$">
>
> <xsl:matching-substring>
> <xsl:value-of select="."/>
> </xsl:matching-substring>
>
> <xsl:non-matching-substring>
> <xsl:element name="{$new_element_name}">
> <xsl:value-of select="."/>
> </xsl:element>
> </xsl:non-matching-substring>
>
> </xsl:analyze-string>
>
> </xsl:template>
>
>
> What is not clear to me is:
>
> - whether the regex actually suffices to match the rules
>
> - if it is a good idea to use xsl:for-each there
>
> - how to assure concatenation of all the "e" instances'
> contents in cases 1) and 4) without processing them
> repeatedly - i.e.: how can I restrict the call to
> xsl:apply-templates to cases 2) and 3)?
>
> Any comments would be greatly appreciated.
>
> Yves
|