Subject: Re: find capital letters in string and split it
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Mon, 10 Feb 2003 04:44:18 -0800 (PST)
|
---- "bryan" <bry@xxxxxxxxxx> wrote:
> In Rdf/Xml it's often the habit to camel-case strings in IDs and
> such.
>
> Let's suppose I want to split the string at the upper case letters,
> the easiest way I can see to do that (the only way that pops into my
> mind) is to parse the string twice, using translate() and replacing
> upper-case letters with a string sequence not very likely to occur
> normally, and then reparse the string splitting it at these
> occurrences. This is of course resource intensive and not foolproof.
> Anybody have any thoughts on how to do this?
Hi Bryan,
It seems to me that you want to preserve the capital letters? If *not*
so, then the following is a most straightforword solution using the
"str-split-to-words" template of FXSL:
This transformation:
-------------------
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:import href="strSplit-to-Words.xsl"/>
<!-- This transformation must be applied to:
testSplitToWords4.xml
-->
<xsl:output indent="yes" omit-xml-declaration="yes"/>
<xsl:variable name="vCaps"
select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:template match="/">
<xsl:call-template name="str-split-to-words">
<xsl:with-param name="pStr" select="/*"/>
<xsl:with-param name="pDelimiters"
select="$vCaps"/>
</xsl:call-template>
</xsl:template>
</xsl:stylesheet>
when applied against this source.xml:
<t>thisIsACamelCasedWord</t>
Produces:
<word>this</word>
<word>s</word>
<word>amel</word>
<word>ased</word>
<word>ord</word>
In case you need to preserve the capital letters, the solution is
slightly different. One first pass is made on the string, which inserts
a space in front of every capital letter. The newly produced string is
then tokenised. In the first pass I also use the "str-map" template
from FXSL.
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:myMark="f:MarkAnUppercase"
exclude-result-prefixes="myMark"
>
<xsl:import href="str-map.xsl"/>
<xsl:import href="strSplit-to-Words.xsl"/>
<!-- This transformation must be applied to:
testSplitToWords4.xml
-->
<xsl:output indent="yes" omit-xml-declaration="yes"/>
<xsl:variable name="vCaps"
select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<myMark:myMark/>
<xsl:template match="myMark:*">
<xsl:param name="arg1"/>
<xsl:if test="contains($vCaps, $arg1)">
<xsl:text> </xsl:text>
</xsl:if>
<xsl:value-of select="$arg1"/>
</xsl:template>
<xsl:template match="/">
<xsl:variable name="vSpaceDelimited">
<xsl:call-template name="str-map">
<xsl:with-param name="pFun"
select="document('')/*/myMark:*[1]"/>
<xsl:with-param name="pStr" select="/*"/>
</xsl:call-template>
</xsl:variable>
<xsl:call-template name="str-split-to-words">
<xsl:with-param name="pStr" select="$vSpaceDelimited"/>
<xsl:with-param name="pDelimiters"
select="' '"/>
</xsl:call-template>
</xsl:template>
</xsl:stylesheet>
when applied against the same source.xml produces:
<word>this</word>
<word>Is</word>
<word>A</word>
<word>Camel</word>
<word>Cased</word>
<word>Word</word>
Hope this helped.
=====
Cheers,
Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL
__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|