RE: [xsl] optimization for very large, flat documents

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

Subject: RE: optimization for very large, flat documents
From: Pieter Reint Siegers Kort <pieter.siegers@xxxxxxxxxxx>
Date: Thu, 20 Jan 2005 12:20:18 -0600

You're welcome Kevin - please let us know what your findings are! 

Cheers,
<prs/>

-----Original Message-----
From: Kevin Rodgers [mailto:kevin.rodgers@xxxxxxx] 
Sent: Jueves, 20 de Enero de 2005 12:09 p.m.
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re:  optimization for very large, flat documents

Thanks to everyone who responded.  For now I plan to follow Pieter's idea of
chunking the data into manageable pieces (16-64 MB).  Then I'm going to look
into Michael's suggestions about STX (unfortunately, not yet a W3C
recommendation and thus not widely implemented) and XQuery.

For anyone interested in some numbers, I've split each of my 2 large files
(613 MB and 656 MB) into subfiles of 16 K independent entries (which vary in
size), yielding sets of 25 and 37 subfiles (of approx. 25 MB and 17 MB each,
respectively).  I process them by running Saxon 8.2 from the command line
(with an -Xmx value of 8 times the file size) on a Sun UltraSPARC with 2 GB
of real memory.  The set of 37 17 MB XML subfiles are processed with a
slightly simpler stylesheet, and take about 1:15 (minutes:seconds) each; the
set of 25 25 MB XML subfiles use
1 document() call per entry to/from a servlet on a different host and take
about 8 minutes each.

My next step is to use Saxon's profiling features to find out where I can
improve my stylesheet's performance.

Thanks again to everyone on xsl-list for all your help!
--
Kevin Rodgers

Current Thread
RE: optimization for very large, flat documents, (continued) Jim Neff - 20 Jan 2005 18:29:19 -0000 Pieter Reint Siegers Kort - 19 Jan 2005 02:25:55 -0000 Pawson, David - 19 Jan 2005 10:01:35 -0000 Michael Kay - 19 Jan 2005 10:47:37 -0000 Pieter Reint Siegers Kort - 20 Jan 2005 18:21:06 -0000 <=

<- Previous	Index	Next ->
RE: optimization for very lar, Michael Kay	Thread	accessing attributes of a nod, Markus Gamperl
Re: using boolean to compare , dan sherman	Date	RE: optimization for very lar, Jim Neff
	Month

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >