Subject: RE: memory usage of xslt processing
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 19 Apr 2006 13:59:08 +0100
|
XSLT processors generally read the whole document into memory. Some products
may be able to avoid this under certain circumstances, for example see
http://www.saxonica.com/documentation/sourcedocs/serial.html for Saxon.
Running one transformation per row is certainly feasible in principle though
there may be a significant start-up overhead - you'll only find out by
measurement.
Alternatively, why not retrieve the data from the database in
transformer-sized chunks?
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Thomas Porschberg [mailto:thomas.porschberg@xxxxxxxxx]
> Sent: 19 April 2006 13:36
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: memory usage of xslt processing
>
> Hi,
>
> I have the following task:
> Create an arbitrary formatted file (XML/HTML/CSV whatever)
> based on a Select from a database.
>
> As a constraint the amount of data fetched from the database
> can not be stored in memory as a whole.
> Another constraint is that I can not use XML-functionality in
> the database, I have to implement the functionality on top of
> our database access framework. This database access framework
> fetches record for record one after another.
> And I have to use Java and Xalan.
>
> My idea was to decorate every fetched row from the database
> with simple generic XML and fire this to Xalan.
>
> Let do an example:
> If my result set from the database looks like:
>
> ID Name Description
> -- ---- -----------
> 1 "dog" "an animal may be dangerous"
> 2 "cat" "an animal likes milk"
>
> I create the following XML:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <dataset>
> <row>
> <value>1</value>
> <value>dog</value>
> <value>an animal may be dangerous</value> </row> <row>
> <value>2</value>
> <value>cat</value>
> <value>an animal likes milk</value>
> </row>
> </dataset>
>
> I create this XML as "Sax fire events" in an java
> class[StringArrayXMLReader], which implements the
> org.xml.sax.XMLReader interface.
> I have three methods:
>
> public void init() throws SAXException {
> ch.startDocument( );
> ch.startElement("","dataset","dataset",EMPTY_ATTR);
> }
>
> public void close() throws SAXException {
> ch.endElement("","dataset","dataset");
> ch.endDocument( );
> }
>
> public void parse(String [] input) throws SAXException {
> ch.startElement("","row","row",EMPTY_ATTR);
> for (int i = 0; i< input.length; ++i){
> ch.startElement("","value","value",EMPTY_ATTR);
> ch.characters(input[i].toCharArray(),
> 0,input[i].length( ));
> ch.endElement("","value","value");
> }
> ch.endElement("","row","row");
> }
>
> The parse method creates the <row>...</row> entries for an
> overhanded String array.
> The StringArrayXMLReader is associated with a
> TransformerHandler, which uses a XSL stylesheet to transform
> the XML to the desired output.
>
> What happens here is, that when the fetch from the database
> starts I call init() ( and thus startDocument() ) and at
> last, after the fetch finished, I call close() (and thus
> endDocument()).
> I observed that the xslt processing starts when endDocument()
> is called.
> This is not acceptable for me because I fear the xslt
> processor reads all the rows into memory until endDocument()
> is called and in this case I take a risk to run in OutOfMemory.
>
> My second idea was to eliminate the init()/close() methods
> and to consider one <row>...</row> section as complete
> document input for the processor. This has the disadvantage
> that I have to create the head and tail of the document
> manually (and in my example I get a NullPointerException when
> I the transformer is called twice).
>
> I have the following questions:
> Is it possible to create the output without having the whole
> data in memory ?
> The basis XML for xslt processing
> <dataset>
> <row><value>...
> <row><value>...
> </dataset>
> looks very simple and the supplied XLS stylesheets will be
> not complex so my hope is to get it working.
> I also think that the task in general - produce formatted
> output from a potential very large data pool - should be a common one.
> Unfortunately I did not do much xslt-processing in the past
> so I lack the experience (a bit libxslt which I feed a DOM tree).
> If someone has some striking links I would very glad to hear.
> My test code I provide at:
>
> http://randspringer.de/sax_row.tar and
> http://randspringer.de/sax.tar
>
> If someone could have a look at it I would really appreciate it.
>
> Thomas
>
>
> --
|