Subject: Re: Matching only text nodes with certain (complicated) properties
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Fri, 09 Jan 2009 13:11:02 -0500
|
David,
You are getting the false matches because you are also matching the
empty text nodes. So adjust further:
body//text()[normalize-space()][preceding::text()[normalize-space()][1]
<< preceding::pb[1]]
This ought to preclude getting that page info more than once, too. If
that's a problem let us know.
Another approach to this -- and an attractive one, since your schema
is known and your data well-controlled -- is to use xsl:strip-space
to strip whitespace from elements you know will never have any
significant whitespace. Those would be elements that can contain only
elements according to the schema. Then you wouldn't need to do the
extra filtering using normalize-space().
Cheers,
Wendell
At 12:53 PM 1/9/2009, you wrote:
Only now that I'm reading your replies am I understanding what
"preceding::" actually matches. Thanks!
Good clue with the "normalize-space()", Wendell, but still, somehow
whitespace seems to be a problem:
An original XML document (TEI):
...
<item n="c">The <mentioned>i</mentioned> of the nom. before a vowel in the RV.
<pb n="26"/>
<list>
<item n="a">The <mentioned>i</mentioned> of the ...
...
after applying the following XSL 2.0 Transformation template (among
others; the "body//" part of the match ensures that only text nodes
from the <body> of the document are considered):
<xsl:template
match="body//text()[preceding::text()[normalize-space()][1] <<
preceding::pb[1]]">
<span class="pagenumber">page <xsl:value-of
select="preceding::pb[1]/@n"/></span>
<xsl:apply-templates/>
</xsl:template>
becomes:
...
<li>The <span class="ved">i</span> of the nom. before a vowel in the RV.
<span class="pagenumber">page 26</span>
<ol style="list-style-type:lower-greek">
<span class="pagenumber">page 26</span>
<li>
<span class="pagenumber">page 26</span>
<span class="ved">i</span> of the ...
...
You see, a lot of <span>s are added not just to the very first text
node. These seem to be added just around those places where I have a
<pb/> in the original, so I suppose it's got to do with whitespace
(there's always one empty line before and after <pb/> in the source XML).
I'm using Saxon B 9.1.0.3 for my XSL 2.0 transformation (in Oxygen).
I'm looking into the thing some more today but thank you for your
replies so far.
======================================================================
Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
|