Subject: RE: Processing Memory-Hungry Data Sets with XSLT 2
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 12 Mar 2008 00:05:04 -0000
|
Almost any performance question is processor-specific to some extent.
However, it's not unlikely that different processors use similar
implementation techniques much of the time.
Given your description of the problem, I would be looking for unnecessary
temporary trees and copy operations. With Saxon it's usually the case that
tree-construction (xsl:variable with content and no "as" attribute) is done
eagerly, whereas sequence construction (xsl:variable with a select
attribute) is done lazily.
But with performance the devil is always in the detail, and sometimes it can
be in quite surprising places in the detail.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Eliot Kimber [mailto:ekimber@xxxxxxxxxxxx]
> Sent: 11 March 2008 19:51
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Processing Memory-Hungry Data Sets with XSLT 2
>
> I'm implementing some DITA processing that is applied against
> a large tree of maps and topics referenced from the maps in
> order to generate HTML from the maps and the topics. There
> are 10s of 1000s of maps and topics.
>
> I have two processors: one is essentially an identity
> transform that process the map tree and copies it to the
> output with a little bit of modification. The other is the
> XML-to-HTML transform. It is still essentially a one-to-one
> file-to-file transform but the result files are HTML instead
> of copies. The process essentially does a top-down process of
> the tree of maps, which consist of either links to submaps or
> links to topics. Submaps are loaded and their topic links
> processed. Links to topics result in loading the target
> topics and processing them normally to generate HTML output.
> This obviously results in a lot of source and target
> documents in memory. The business logic is very simple, it's
> just a lot of data.
>
> Using Saxon 9 the first script can process my entire corpus
> but the second one (the HTML generator) fails about 1/2 way
> through with an out of memory failure using the largest VM I
> can request under OS X (2Gig).
>
> I tried using Saxon's extension discard-document() method but
> that appeared to have no effect (I didn't really expect it to
> since I don't think anything referenced ever gets unreferenced).
>
> My question is, are there any XSLT 2 techniques that might
> help avoid this type of memory usage issue that are generic
> (as opposed to Saxon specific)? I can think of several
> multi-pass approaches involving the creation of intermediate
> files that would work but time is short so I'm trying to keep
> this as simple as I can and still have it work, so I was
> hoping there might be some clever way to make an otherwise
> naive top-down process more memory efficient.
>
> If the only answer is Saxon-specific then I'll move my
> question to the Saxon list.
>
> Thanks,
>
> Eliot
> --
> Eliot Kimber
> Senior Solutions Architect
> "Bringing Strategy, Content, and Technology Together"
> Main: 610.631.6770
> www.reallysi.com
> www.rsuitecms.com
|