Subject: RE: Better Way to Group Siblings By Start/End Markers?
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 24 Jun 2008 00:04:03 +0100
|
Another possibility is to use xsl:for-each-group with group-starting-with.
I seem to remember that when I last did this, however, it turned out to be
easier using sibling recursion - that is, have each w:r element
apply-templates to its immediately following sibling.
Either way, processing Word XML using XSLT is not for the faint-hearted.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Eliot Kimber [mailto:ekimber@xxxxxxxxxxxx]
> Sent: 23 June 2008 23:04
> To: xsl-list
> Subject: Better Way to Group Siblings By Start/End Markers?
>
> I am experimenting with using XSLT to convert Office Open XML
> into InCopy INCX (the CS3 Word import fails to capture some
> things I need captured from the Word data).
>
> One challenge is handling Word fields, which need to be
> converted to any number of different, and
> differently-structured, INCX constructs (whose details are
> not important here).
>
> A Word field is organized as a sequence of w:r elements
> within a larger sequence of w:r elements. A field start is
> indicated by a w:r with a field start indicator and the field
> end is indicated by another w:r with a field end indicator.
> The w:r elements between these two marker elements comprise
> the field data, which can be any number of things, including
> w:r elements that would easily occur outside the scope of the
> field (e.g., w:r containing literal document content).
>
> Here is a typical sample:
>
> <w:r>
> <w:t xml:space="preserve">- </w:t>
> </w:r>
> <w:r
> w:rsidR="00BA1D13">
> <w:fldChar
> w:fldCharType="begin"/>
> </w:r>
> <w:r
> w:rsidR="00BA1D13">
> <w:instrText>HYPERLINK "http://www.example.com/"</w:instrText>
> </w:r>
> <w:r
> w:rsidR="00BA1D13">
> <w:fldChar
> w:fldCharType="separate"/>
> </w:r>
> <w:r
> w:rsidRPr="00B233E5">
> <w:t>HTTP</w:t>
> </w:r>
> <w:r
> w:rsidR="00BA1D13">
> <w:fldChar
> w:fldCharType="end"/>
> </w:r>
>
> I have this for-each-group that seems to group correctly, but
> I'm wondering if there's a simpler expression that does what I want:
>
> <xsl:for-each-group select="w:r"
> group-adjacent="
> string(self::*[w:fldChar[@w:fldCharType = 'begin' or
> @w:fldCharType = 'end']] or
> (self::*[preceding-sibling::*/w:fldChar[@w:fldCharType =
> 'begin']] and
> self::*[following-sibling::*/w:fldChar[@w:fldCharType =
> 'end']] and
> count((self::*[preceding-sibling::*/w:fldChar[@w:fldCharType
> =
> 'begin']])[1]/(*[following-sibling::*/w:fldChar[@w:fldCharType
> = 'end']])[1]
> |
> (self::*[following-sibling::*/w:fldChar[@w:fldCharType =
> 'end']])[1]) = 1
> ))
> "
> >
>
> In prose (at least this is what I intend the above expression
> to mean): if w:r has child w:fldChar where @w:fldCharType =
> 'begin' or 'end' or w:r has both a preceding sibling w:r with
> a w:fldChar of type 'begin' and a following sibling w:r with
> a w:fldChar of type 'end' AND the nearest preceding sibling
> field start has the same nearest following sibling field end
> as the current node, then return the grouping "true" else
> return the grouping key "false".
>
> Whew.
>
> I can't think of a simpler way to say this. Is there one?
>
> I realize I could factor some of the complexity of the
> expression out into a function or two, which I will probably do.
>
> Thanks,
>
> Eliot
>
> ----
> Eliot Kimber | Senior Solutions Architect | Really Strategies, Inc.
> email: ekimber@xxxxxxxxxxxx <mailto:ekimber@xxxxxxxxxxxx>
> office: 610.631.6770 | cell: 512.554.9368 2570 Boulevard of
> the Generals | Suite 213 | Audubon, PA 19403 www.reallysi.com
> <http://www.reallysi.com> | http://blog.reallysi.com
> <http://blog.reallysi.com> | www.rsuitecms.com
> <http://www.rsuitecms.com>
|