Subject: Re: Katakana substitution regex
From: Lars Huttar <lars_huttar@xxxxxxx>
Date: Fri, 06 Aug 2010 15:57:07 -0500
|
On 8/6/2010 3:14 PM, Hoskins & Gretton wrote:
> HI, I have to convert some Katakana strings from "original" to "new"
> by adding ー (#x30fc;) a pronunciation character (see
> http://www.fileformat.info/info/unicode/char/30fc/index.htm).
> In Japanese, there aren't any word boundaries, so essentially all of
> my search strings are substrings of the text of the current element.
> When substring "a" is followed by the character ー I do not want
> to make the replacement.
>
> example: ブラウザ is a search string
> but it is followed by ー already -- do nothing
>
> When substring "a" is not followed by the character ー I want to
> make the replacement to create "a" followed by ー.
>
> example: ブラウザ is a search string
> but it is not followed by #x30fc; already
> add to the end to make it
> ブラウザー
>
> If I was going to just add the ー, I was able to do that with a
> regex that contained the strings that I wanted to find by using regex
> and analyze-string, where $regexSearch contains all of my search
> Katakana strings:
>
> <xsl:analyze-string select="." regex="({$regexSearch})">
> <xsl:matching-substring>
> <xsl:value-of select="regex-group(1)"/>
> <xsl:text>ー</xsl:text>
> </xsl:matching-substring>
> <xsl:non-matching-substring>
> <xsl:value-of select="."/>
> </xsl:non-matching-substring>
> </xsl:analyze-string>
> However,I can't figure out how I should fit this in to an overall
> xslt, where I need to check check ahead in the element text before I
> decide to make the substitution. Currently, if there is a
> string: ブラウザー
> it becomes: ブラウザーー
> (doubling the last character).
>
> If someone has some experience with this type of search and replace
> problem, I would appreciate some guidance.
> Regards, Dorothy
>
>
How about
select="replace(., 'ザ([^ー])', 'ザー$1')"
?
And if that fails to catch ザ when it occurs at the end of a text
node, wrap the result in
replace(., 'ザ$', 'ザー')
HTH,
Lars
|