Subject: RE: Comparing grouping techniques in terms of performance
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Tue, 6 Apr 2004 18:27:03 +0100
|
You seem to have varied several things between the two stylesheets. One of
them uses for-each, another uses apply-templates; one uses the generate-id()
approach to compare node identity, the other uses the count($X|.) technique;
one adds more output; one does sorting. The golden rule with performance
comparisons is to only change one variable at a time. And then you need to
repeat the measurements with a different XSLT processor to see whether the
results are similar.
Michael Kay
> -----Original Message-----
> From: Pieter Reint Siegers Kort [mailto:pieter.siegers@xxxxxxxxxxx]
> Sent: 06 April 2004 16:43
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Comparing grouping techniques in terms of performance
>
> Hi all,
>
> looking at various requests in the list regarding grouping,
> especially the
> Muenchian Method, explained very well by Jeni at
> http://www.jenitennison.com/xslt/grouping/muenchian.html, and another
> method I regularly have seen before, that uses
> template processing rather than the <for-each> approach (see
> below), I
> wanted to see how the two methods compare in
> terms of performance.
>
> So, suppose I have the same input that Jeni uses, but making
> it a bigger XML
> file (about 2000 entries):
>
> <records>
> <contact id="0001">
> <title>Mr</title>
> <forename>John</forename>
> <surname>Smith</surname>
> </contact>
> <contact id="0002">
> <title>Dr</title>
> <forename>Amy</forename>
> <surname>Jones</surname>
> </contact>
> <contact id="0002">
> <title>Mr</title>
> <forename>Brian</forename>
> <surname>Jones</surname>
> </contact>
> <contact id="0002">
> <title>Ms</title>
> <forename>Fiona</forename>
> <surname>Smith</surname>
> </contact>
> ... repeating the above block ...
> </records>
>
>
> Using the <for-each> approach on my machine [Dell GX-240, Win2003,
> XSelerator 2.6, MSXML 4.0], like this:
>
> <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform
> <http://www.w3.org/1999/XSL/Transform> " version="1.0">
>
> <xsl:key name="contacts-by-surname" match="contact" use="surname" />
>
> <xsl:key name="contacts-by-surname" match="contact" use="surname" />
> <xsl:template match="records">
> <xsl:for-each select="contact[count(. | key('contacts-by-surname',
> surname)[1]) = 1]">
> <xsl:sort select="surname" />
> <xsl:value-of select="surname" />,<br />
> <xsl:for-each select="key('contacts-by-surname', surname)">
> <xsl:sort select="forename" />
> <xsl:value-of select="forename" /> (<xsl:value-of
> select="title" />)<br
> />
> </xsl:for-each>
> </xsl:for-each>
> </xsl:template>
>
> </xsl:transform>
>
> showed that the transformation took up about 750 msec.
>
> Then, using the template approach (adding just a bit of
> HTML), as follows:
>
> <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform
> <http://www.w3.org/1999/XSL/Transform> " version="1.0">
>
> <xsl:key name="contacts-by-surname" match="contact" use="surname" />
>
> <xsl:template match="records">
> <html>
> <body>
> <xsl:apply-templates select="contact[generate-id() =
> generate-id(key('contacts-by-surname', surname))]" mode="groups"/>
> </body>
> </html>
> </xsl:template>
>
> <xsl:template match="contact" mode="groups">
> <ul>
> <xsl:value-of select="surname"/>,<br/><xsl:apply-templates
> select="key('contacts-by-surname', surname)"/>
> </ul>
> </xsl:template>
>
> <xsl:template match="contact">
>       <xsl:value-of
> select="forename"/> (<xsl:value-of select="title"/>)<br/>
> </xsl:template>
>
> </xsl:transform>
>
> which does practically the same, it took only about 50 msec,
> which means a
> performance gain of 750/50 = 15 times better!!
>
> I haven't been able yet to test using the .NET XslTransform
> class, but that
> will come in a later stage...
>
> So for big input files and using MSXML 4.0, I would rather
> use the second
> approach.... wouldn't you all agree?
>
> And if so, shouldn't the second method not be the first (and
> preferred)
> method mentioned by Jeni (after all, everyone points to that
> page at first
> instance)?
>
> <prs/>
> http://www.pietsieg.com <http://www.pietsieg.com/>
> http://www.pietsieg.com/dotnetnuke
> Contributor on www.ASPToday.com <http://www.asptoday.com/>
> Co-author on "Professional ASP.NET XML with C#", July 2002 by
> Wrox Press
|