Subject: RE: How To Calculate Set of Unique Values Across a Tree of Input Documents
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 21 Mar 2008 19:11:30 -0000
|
There was a recent thread on processing graphs in XSLT 2.0, see
http://markmail.org/message/tlletsiznepd5no6
I provided a (sketch of a) solution that involved listing all the paths
starting at a given node (while avoiding looping in the event of a cycle); a
simple adaptation of that will give you all the nodes reachable from a given
node. In your case the node identifiers can be obtained using
document-uri(); you then simply need to apply distinct-values() to the
returned set of URIs.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Eliot Kimber [mailto:ekimber@xxxxxxxxxxxx]
> Sent: 21 March 2008 18:52
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: How To Calculate Set of Unique Values Across a
> Tree of Input Documents
>
> I have a tree of DITA map documents where each map references
> zero or more other map or topic documents. The same map or
> topic could be referenced multiple times.
>
> I need to calculate the "bounded object set" of unique
> documents referenced from within the compound map so that I
> can then use an XSLT process to create new copies of each
> document. Since I can't write to a given result more than
> once I have to first remove any duplicates.
>
> Each target document is referenced by a relative URI that can
> be different for different references to the same file (and
> in fact will almost always be different in my particular data set).
>
> I am using XSLT 2.
>
> Because key() tables are bound to input documents I don't
> think I can build a table of references indexed by target
> document URI (that is, the absolute URI of the target of the
> reference). If I could I would simply build that table and
> then just process the first member of each entry.
>
> I can't think of any other efficient way to approach this.
> The best idea I can come up with is to build an intermediate
> document that reflects each document reference and then use
> something like for-each-group on that to treat it as a set
> for the purpose of processing each referenced file exactly
> once. If I build a flat list of elements containing the
> document URI of each reference I can easily sort the values
> and then remove duplicates. So maybe that's as efficient as
> anything else would be.
>
> My other challenge is that my input data set is very large so
> I have the potential to run into memory issues, so it may be
> that writing out an intermediate file as part of a
> multi-stage, multi-transform pipeline is
> the best process, but my current processor will handle the
> entire data set in one process for the purpose of applying
> the (mostly) identity transform to the map set.
>
> Can anyone suggest other solution approaches to this problem?
>
> Once again I feel like I might be missing a clever solution
> hidden in the haze of my XSLT 1 brain damage.
>
> Thanks,
>
> Eliot
>
> --
> Eliot Kimber
> Senior Solutions Architect
> "Bringing Strategy, Content, and Technology Together"
> Main: 610.631.6770
> www.reallysi.com
> www.rsuitecms.com
|