Subject: Re: efficient traversal of combined collections in XSLT 3.0
From: Graydon <graydon@xxxxxxxxx>
Date: Tue, 27 Nov 2012 04:58:22 -0500
|
On Sat, Nov 24, 2012 at 03:27:24PM +0000, Michael Kay scripsit:
> The way we do this in maintaining the XSLT/XQuery specs (admittedly
> much smaller than your 4GB) is to maintain a derived document
> containing a list of valid link targets. This is regenerated when
> the base documents change, which is less frequently than the list is
> used. The list of valid anchors is much smaller than the base
> documents, so it can be loaded more quickly, and uses less memory.
That gets saxon:discard-document() to work. (well, up until the point
the transform fails with no error message _and_ closing the outer loop;
something, somewhere, is awful in the input. Which is not a surprise
but is hard to find!)
I _suspect_, but could not take the time to prove, that the use of
for $x in collection($pathToContent) return
(saxon:discard-document($x)//link,saxon:discard-document($x)//target[not(.//link)])
means that discard-document can't tell it is supposed to let go.
Separating those out into distinct for-each statements made things
behave in a much more useful fashion.
> Also, generating the list of anchors is an operation that can be
> streamed; hopefully the resulting list is small enough that it can
> be held in memory for look-up purposes.
It can; once I've got the list of anchors the compare runs in about
fifteen seconds.
Thank you!
-- Graydon, who keeps getting freaked out by the orders-of-magnitude
run-time differences from apparently small code changes
|