Hi Folks,
Eliot Kimber raised a neat question on the SAXON mailing list.
Here is a summary of the ensuing discussion.
Scenario: There are a million XML documents that need to be transformed. Each
file is in the 1-4KB range. The files are organized into directories about 4
or 5 deep and some directories have 100s or 1000s of files.
Use XSLT to do the transformations.
Specifically, use the XSLT collection() function along with
saxon:discard-document().
Transforming a million files is easily handled by Saxon-EE, which uses
multiple threads for document parsing (equally xsl:result-document will use
multiple threads for writing the result). A key thing to remember is to use
saxon:discard-document() to ensure that the documents are garbage collected
after processing.
Here is the XSLT code:
<xsl:for-each select="for $x in
collection('file:///c:/path/to/xml?select=*.xml;recurse=yes;on-error=ignore')
return saxon:discard-document(f.doTransform($x))">
|