Subject: Re: segmenting a paragraph
From: "Andrew Welch" <andrew.j.welch@xxxxxxxxx>
Date: Tue, 2 Oct 2007 11:15:21 +0100
|
On 02/10/2007, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> When you need to apply regex matching to text that crosses node boundaries,
> in the past two approaches have been proposed:
>
> (a) create a string in which the node boundaries are represented by some
> recognizable textual markup (you could use saxon:serialize()), then apply
> the regex processing, then reinstate the node structure (e.g. by using
> saxon:parse()).
Provided the <note> elements don't break a sentence then it's not needed is it?
eg for:
"First sentence. <note>a note.<note/> Second sentence."
<xsl:template match="p">
<xsl:apply-templates/>
</
<xsl:template match="p/text()">
<xsl:analyze-string ...
</
<xsl:template match="p/*">
<xsl:copy-of select="."/>
</
...should meet the requirements as described. I guess you're
expecting the requirements to change to "First <note>note</note>
sentence." .... ?
--
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/
|