Subject: RE: 10,000 document()'s
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Wed, 9 Apr 2003 20:38:42 +0100
|
I would suggest writing a SAX filter that invokes the XSLT
transformations (one transformation for each file) via JAXP, gets the
result back in a StringWriter, and adds an element containing the word
count to the output stream.
Michael Kay
Software AG
home: Michael.H.Kay@xxxxxxxxxxxx
work: Michael.Kay@xxxxxxxxxxxxxx
> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of
> Peter Binkley
> Sent: 08 April 2003 17:06
> To: 'xsl-list@xxxxxxxxxxxxxxxxxxxxxx'
> Subject: 10,000 document()'s
>
>
> I need advice on how to tackle this problem: I've got a file
> that contains a list of about 10,000 other files, and I want
> to process the list so as to add a wordcount for each of the
> external files. Something like this:
>
> Input:
>
> <files>
> <file>
> <filename>/path/to/file/2844942.xml</filename
> <file>
> <file> .... </file>
> <files>
>
> Output:
>
> <files>
> <file>
> <filename>/path/to/file/2844942.xml</filename
> <wordcount>2938</wordcount>
> <file>
> <file> .... </file>
> <files>
>
> The obvious approach is to use a for-each loop that includes
> a variable that opens the external file using a document()
> call. The problem is that the process inevitably runs out of
> memory, both with Saxon and Xalan. It seems that the
> variables are passing out of scope and being destroyed as
> they should, but I gather from a posting by Michael Kay
> (http://www.biglist.com/lists/xsl-list/archives/200212/msg0050
7.html) that all of those document() source trees are remaining in
memory throughout the transformation, adding up to megabytes of data.
Can anyone suggest a strategy? The process doesn't have to be fast, it
just has to finish.
Peter Binkley
Digital Initiatives Technology Librarian
Information Technology Services
4-30 Cameron Library
University of Alberta Libraries
Edmonton, Alberta
Canada T6G 2J8
Phone: (780) 492-3743
Fax: (780) 492-9243
e-mail: peter.binkley@xxxxxxxxxxx
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|