Subject: Re: memory usage of xslt processing
From: Thomas Porschberg <thomas.porschberg@xxxxxxxxx>
Date: Thu, 20 Apr 2006 06:32:11 +0200
|
Am Wed, 19 Apr 2006 13:59:08 +0100
schrieb "Michael Kay" <mike@xxxxxxxxxxxx>:
> XSLT processors generally read the whole document into memory. Some
> products may be able to avoid this under certain circumstances, for
> example see
> http://www.saxonica.com/documentation/sourcedocs/serial.html for
> Saxon.
I have to use Xalan and I heard of "SQL extensions". I have to try it
out.
>
> Running one transformation per row is certainly feasible in principle
> though there may be a significant start-up overhead - you'll only
> find out by measurement.
Yes, but http://randspringer.de/sax_row.tar gives me an error currently.
And it is "ugly" because I have to provide the header by myself.
>
> Alternatively, why not retrieve the data from the database in
> transformer-sized chunks?
It does not remove the problem with the header. Of course it should be
faster to call stylesheet processing for multiple rows instead for
a single row.
As next step I will have a look at
http://stx.sourceforge.net/ and http://joost.sourceforge.net/.
Thank you,
Thomas
>
> Michael Kay
> http://www.saxonica.com/
>
> > -----Original Message-----
> > From: Thomas Porschberg [mailto:thomas.porschberg@xxxxxxxxx]
> > Sent: 19 April 2006 13:36
> > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > Subject: memory usage of xslt processing
> >
> > Hi,
> >
> > I have the following task:
> > Create an arbitrary formatted file (XML/HTML/CSV whatever)
> > based on a Select from a database.
> >
> > As a constraint the amount of data fetched from the database
> > can not be stored in memory as a whole.
> > Another constraint is that I can not use XML-functionality in
> > the database, I have to implement the functionality on top of
> > our database access framework. This database access framework
> > fetches record for record one after another.
> > And I have to use Java and Xalan.
> >
> > My idea was to decorate every fetched row from the database
> > with simple generic XML and fire this to Xalan.
> >
> > Let do an example:
> > If my result set from the database looks like:
> >
> > ID Name Description
> > -- ---- -----------
> > 1 "dog" "an animal may be dangerous"
> > 2 "cat" "an animal likes milk"
> >
> > I create the following XML:
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <dataset>
> > <row>
> > <value>1</value>
> > <value>dog</value>
> > <value>an animal may be dangerous</value> </row> <row>
> > <value>2</value>
> > <value>cat</value>
> > <value>an animal likes milk</value>
> > </row>
> > </dataset>
> >
> > I create this XML as "Sax fire events" in an java
> > class[StringArrayXMLReader], which implements the
> > org.xml.sax.XMLReader interface.
> > I have three methods:
> >
> > public void init() throws SAXException {
> > ch.startDocument( );
> > ch.startElement("","dataset","dataset",EMPTY_ATTR);
> > }
> >
> > public void close() throws SAXException {
> > ch.endElement("","dataset","dataset");
> > ch.endDocument( );
> > }
> >
> > public void parse(String [] input) throws SAXException {
> > ch.startElement("","row","row",EMPTY_ATTR);
> > for (int i = 0; i< input.length; ++i){
> > ch.startElement("","value","value",EMPTY_ATTR);
> > ch.characters(input[i].toCharArray(),
> > 0,input[i].length( ));
> > ch.endElement("","value","value");
> > }
> > ch.endElement("","row","row");
> > }
> >
> > The parse method creates the <row>...</row> entries for an
> > overhanded String array.
> > The StringArrayXMLReader is associated with a
> > TransformerHandler, which uses a XSL stylesheet to transform
> > the XML to the desired output.
> >
> > What happens here is, that when the fetch from the database
> > starts I call init() ( and thus startDocument() ) and at
> > last, after the fetch finished, I call close() (and thus
> > endDocument()).
> > I observed that the xslt processing starts when endDocument()
> > is called.
> > This is not acceptable for me because I fear the xslt
> > processor reads all the rows into memory until endDocument()
> > is called and in this case I take a risk to run in OutOfMemory.
> >
> > My second idea was to eliminate the init()/close() methods
> > and to consider one <row>...</row> section as complete
> > document input for the processor. This has the disadvantage
> > that I have to create the head and tail of the document
> > manually (and in my example I get a NullPointerException when
> > I the transformer is called twice).
> >
> > I have the following questions:
> > Is it possible to create the output without having the whole
> > data in memory ?
> > The basis XML for xslt processing
> > <dataset>
> > <row><value>...
> > <row><value>...
> > </dataset>
> > looks very simple and the supplied XLS stylesheets will be
> > not complex so my hope is to get it working.
> > I also think that the task in general - produce formatted
> > output from a potential very large data pool - should be a common
> > one. Unfortunately I did not do much xslt-processing in the past
> > so I lack the experience (a bit libxslt which I feed a DOM tree).
> > If someone has some striking links I would very glad to hear.
> > My test code I provide at:
> >
> > http://randspringer.de/sax_row.tar and
> > http://randspringer.de/sax.tar
> >
> > If someone could have a look at it I would really appreciate it.
> >
> > Thomas
> >
> >
> > --
>
>
>
--
|