[Home] [By Thread] [By Date] [Recent Entries]

  • To: Michael Kay <mike@s...>
  • Subject: Re: Seeking Examples of XSLT Memory Stress
  • From: 'Liam Quin' <liam@w...>
  • Date: Wed, 17 Aug 2005 14:55:28 -0400
  • Cc: xml-dev@l...
  • In-reply-to: <E1E5S6o-0007k5-KT@m...>
  • References: <20050817172241.GH9894@w...> <E1E5S6o-0007k5-KT@m...>
  • User-agent: Mutt/1.5.9i

On Wed, Aug 17, 2005 at 06:53:41PM +0100, Michael Kay wrote:
>> If the document falls out of scope then both XSLT 1 and 2 allow
>> an implementation to discard it.  I don't think we'll see a
>> procedural way to discard a document otherwise, except as
>> part of something like the XQuery update facility perhaps.

> In practice it's quite difficult to discard the document automatically. The
> spec offers two guarantees:
> 
> (a) if the same document (URI) is loaded again, you'll get the same node
> identifiers
> 
> (b) if the same document (URI) is loaded again, it will have the same
> content
> 
> It would be possible to discard the document and achieve (a) by remembering
> the node identifiers and reusing them if needed. 
Yes.

> Achieving (b) though is really hard, given that the URI might in the
> worst case identify a random number generator. The only real way to do
> it is to serialize a private copy of the document to disk.
You could also behave differently depending on the URI scheme --
an extension to say "trust http expiry times and that the stylesheet
will take no more than 3 hours to run :-) and trust that input files
won't change on disk" might be interesting.

> The real problem though is in deciding when it's a good idea to discard the
> document. For example, if the stylesheet is working its way through the
> @href links from the primary source document, what's the chance that you'll
> want to visit the same target document more than once?

Are there some special cases that are big wins in prctice?
E.g. consider:
    <xsl:template match="foo">
	<!--* load a 500MByte XML file: *-->
	<xsl:variable name="oed" select="doc('oed.xml')" />
	<!--* do stuff with the dcument *-->
	<xsl:element name="word-of-the-day">
	  <xsl:copy-of select="/dictionary/a/entry[@id = 'ascii'] />
	</xsl:element>
    </xsl:template>

if you don't know how often the template matches I can see that you
might want to cache the whole document in memory, but you have a
couple of other choices --
(1) save the result of the template -- in this case it doesn't depend on
    anything other than the input document, and I've seen this usage
    often, e.g. to get a document title
(2) drop the document if you get low on memory 

This case is very clear, but I don't know at what point it stops
being optimiseable, and I'm sure you've thought about it a lot more
than I have! :-)

> That's why I decided
> that in this case having a user function to tell me when the document is no
> longer needed is rather more useful.

I think it's a good compromise, but I agree with you it'd be hard
to get consensus to add that to XPath F&O.

Liam

-- 
Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
http://www.holoweb.net/~liam/

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member