Subject: RE: use XSLT or XQuery in Saxon?
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 6 Jan 2005 12:10:46 -0000
|
> > I have extremely large (over 300 MB) XML file and tens
> > of thousands of small xml files generated after
> > applying various XSLT on the one big XML file.
>
> I don't know whether Mr Kay have tested Saxon with 100+MB
> files or not, but we
> did (6.5.?), and could not get a simple transform to complete
> within hours (I
> think we gave up after ~4hours on a 80-100MB file), on a
> machine with 1GB of RAM.
I've only gone up to about 50Mb myself, but I know of users who've gone up
to 200Mb.
For one Saxonica client I managed to get the processing time for a 40Mb
transformation down from 90 minutes to 45 seconds. Once you've allocated
enough memory, if it still takes hours then it's because there's a
non-linearity in the stylesheet logic, and this can usually be eliminated by
careful use of keys, sorting, or grouping.
But I do agree with you that there are some problems that are better tackled
with a SAX-based Java application: or sometimes a SAX filter as a precursor
to an XSLT transformation.
Michael Kay
http://www.saxonica.com/
>
> I wrote a custom transformer in Java doing exactly what we
> needed using;
> * SAX events
> * Only keeping one branch/leaf of the XML tree in memory at
> any time.
> * Aggregation of content into small mutable value objects,
> which were output
> and discarded when completed.
>
> 1500 files, varying from 360MB to ~10MB of a total of ~10GB
> could be processed
> in a linear speed of ~2MB per second, or close to the disk
> drive speed, on a
> dual CPU workstation.
>
> I suspect that you will end up in 'custom transformer'
> territory, but perhaps
> Saxon has improved and can deal with the transforms you give
> it. I suggest
> that you make some simple tests first, which somewhat
> ressemble what you need
> to do later.
>
>
> Cheers
> Niclas
> --
> ---------------
> If at first you don't succeed, destroy all evidence that you tried.
> - Steven Wright
>
> +---------//-------------------+
> | http://www.dpml.net |
> | http://niclas.hedhman.org |
> +------//----------------------+
|