Subject: RE: Memory problem when stokenize big data
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 10 Jan 2006 14:44:39 -0000
|
The str:tokenize() function defined in EXSLT constructs a tree containing
one element for each token. Unless the implementation is clever enough to
construct a virtual or lazy tree, this is going to take a fair bit of
memory.
By contrast, the XPath 2.0 tokenize() function returns a sequence of
strings, and it's a reasonable bet that any decent implementation is going
to be pipelined, so that it reads off the tokens one at a time as they are
needed.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Richard Zhang [mailto:richard_zhang@xxxxxxxxxx]
> Sent: 10 January 2006 14:30
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Memory problem when stokenize big data
>
> Thanks for your reply to my prior question about breaking
> down strings.
>
> Now I am trying to use stokenize to breakdown a big data.
>
> The input big data is like:
>
> <textdata sep=" 

">
> 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9
> ...
> ...
> </textdata>
> ...
> ...
> <textdata sep=" 

">
> 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9
> ...
> ...
> </textdata>
>
> and my xsl template is like:
>
> <xsl:template match="textdata">
> <data>
> <xsl:for-each select="str:tokenize(.,' 

')">
> <e>
> <xsl:value-of select="."/>
> </e>
> </xsl:for-each>
> </data>
> </xsl:template>
>
> The textdata can be very big. My question is, will the
> stokenzing have
> problem when handling big data? if yes, how big is the data
> that stokenize
> can handle? I ran the transformation in Jbuilder and it shows
> some '10mb
> help left' problem.
>
> Thanks a lot.
> Richard
|