[Home] [By Thread] [By Date] [Recent Entries]
Hi all,
TL;DR: An XSLT 2.0 stylesheet that restructures a flat SVRL document into a hierarchical tree takes ~20 minutes for a 40 MB input (C"B B346,000 elements) on Saxon-PE 12.9. I suspect the following-sibling:: / preceding-sibling:: lookups are the culprit. I would be grateful for hints on how to rewrite the two hot templates (svrl:active-pattern, svrl:fired-rule) while preserving the exact same output. Background: Schematron validation produces an SVRL (Schematron Validation Report Language) document. To make the report accessible to domain experts, the SVRL is post-processed in two steps: (1) Transformation from the rather flat SVRL into a hierarchical XML tree (the step in question). (2) A domain-specific enrichment of that restructured tree. From the resulting in-memory tree, two output chains are derived: a) HTML, and b) XSL-FO, rendered to PDF via FOP. The intermediate result of step (1) is NOT serialised to disk; it is held in an xsl:variable and consumed directly by step (2): <xsl:import href="HierarchicalSVRL.xsl"/> <xsl:variable name="strukt-svrl"> <xsl:apply-templates mode="restructure"/> </xsl:variable> Environment: XSLT processor: Saxon-HE 12.9, no extensions XSLT version: 2.0 Input size: ~40 MB SVRL, ~346,000 elements Runtime of step (1): ~20 minutes Profiling observation: Running Saxon with -TP (profile) shows the dominant "total time (net/ms)" for these two templates, which I would like to optimise: * xsl:template element(Q{http://purl.oclc.org/dsdl/svrl}active-pattern)
* xsl:template element(Q{http://purl.oclc.org/dsdl/svrl}fired-rule)The stylesheet (HierarchicalSVRL.xsl): <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
xmlns:hsvrl="http://tu-dresden.de/vlp/schematron/hierarchical-svrl"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs svrl"
version="2.0"><xsl:output method="xml" indent="yes"/> <!-- The following templates in 'mode="restructure"' perform the restructuring process --> <xsl:template match="@*|comment()" mode="restructure">
<xsl:copy copy-namespaces="no">
<xsl:apply-templates select="@*|comment()" mode="#current"/>
</xsl:copy>
</xsl:template><xsl:template match="svrl:*" mode="restructure"> <xsl:element name="{local-name()}" namespace="http://tu-dresden.de/vlp/schematron/hierarchical-svrl"> <xsl:apply-templates select="@*|comment()|node()" mode="#current"/> </xsl:element> </xsl:template> <xsl:template match="*" mode="restructure"> <xsl:element name="{name()}" namespace="{namespace-uri()}"> <xsl:apply-templates select="@*|comment()|node()" mode="#current"/> </xsl:element> </xsl:template> <xsl:template match="svrl:schematron-output" mode="restructure"> <xsl:element name="{local-name()}" namespace="http://tu-dresden.de/vlp/schematron/hierarchical-svrl"> <xsl:apply-templates select="@*" mode="#current"/> <xsl:comment> This is a restructured SVRL document, which does not comply with ISO 19757-3 Annex D grammar! </xsl:comment> <xsl:apply-templates select="comment()" mode="#current"/> <xsl:apply-templates select="svrl:text" mode="#current"/> <xsl:apply-templates select="svrl:ns-prefix-in-attribute-values" mode="#current"/> <xsl:apply-templates select="svrl:active-pattern" mode="#current"/> </xsl:element> </xsl:template> <xsl:template match="svrl:active-pattern" mode="restructure"> <xsl:element name="{local-name()}" namespace="http://tu-dresden.de/vlp/schematron/hierarchical-svrl"> <xsl:apply-templates select="@*|comment()" mode="#current"/> <xsl:apply-templates select="*" mode="#current"/> <xsl:apply-templates select="following-sibling::svrl:fired-rule[count(preceding-sibling::svrl:active-pattern[1] | current()) = 1]" mode="#current"/> </xsl:element> </xsl:template> <xsl:template match="svrl:fired-rule[@flag = 'ignore']" mode="restructure"> <xsl:apply-templates mode="restructure"/> </xsl:template> <xsl:template match="svrl:failed-assert[preceding-sibling::*[1]/@flag = 'ignore']" mode="restructure" priority="2"> <xsl:apply-templates mode="restructure"/> </xsl:template> <xsl:template match="svrl:successful-report[preceding-sibling::*[1]/@flag = 'ignore']" mode="restructure" priority="2"> <xsl:apply-templates mode="restructure"/> </xsl:template> <xsl:template match="svrl:fired-rule[not(@role)]" mode="restructure" priority="-1"> <xsl:apply-templates mode="restructure"/> </xsl:template> <xsl:template match="svrl:fired-rule" mode="restructure"> <xsl:element name="{local-name()}" namespace="http://tu-dresden.de/vlp/schematron/hierarchical-svrl"> <xsl:apply-templates select="@*|comment()" mode="#current"/> <xsl:variable name="next-element" select="parent::*/following-sibling::*[1]"/> <xsl:if test="$next-element/not(svrl:*)"> <xsl:apply-templates select="$next-element" mode="#current"/> </xsl:if> <xsl:apply-templates select="following-sibling::svrl:failed-assert[count(preceding-sibling::svrl:fired-rule[1] | current()) = 1] | following-sibling::svrl:successful-report[count(preceding-sibling::svrl:fired-rule[1] | current()) = 1]" mode="#current"/> </xsl:element> </xsl:template> </xsl:stylesheet> "Minimal" Input Example (SVRL): (I can post the full sample if anyone wants it; I trimmed it here for brevity.) <svrl:schematron-output xmlns:fx="http://tu-dresden.de/vlp/schematron/functions" xmlns:iso="http://purl.oclc.org/dsdl/schematron" xmlns:planpro="http://www.plan-pro.org/regeln/struktur" xmlns:saxon="http://saxon.sf.net/" xmlns:schold="http://www.ascc.net/xml/schematron" xmlns:svrl="http://purl.oclc.org/dsdl/svrl" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" title="Regelbasis fCB<r PlanPro-PlaZ" schemaVersion="ISO19757-3"> <svrl:ns-prefix-in-attribute-values uri="http://www.plan-pro.org/regeln/struktur" prefix="planpro"/> <svrl:ns-prefix-in-attribute-values uri="http://tu-dresden.de/vlp/schematron/functions" prefix="fx"/> <svrl:ns-prefix-in-attribute-values uri="http://www.w3.org/2001/XMLSchema-instance" prefix="xsi"/> <svrl:active-pattern document="file:/C:/Users/xyz/PlaZ/PlanPro-samples/Testdateien/Bezeichnertest2.xml" id="ID123" name="test rule" fpi="12345678-9ABC-DEF1-2345-6789ABCDEF12" see="test" planpro:workpackage="BASISOBJEKTE" planpro:version="1.10.0.1"> <svrl:text> <planpro:description xmlns="http://purl.oclc.org/dsdl/schematron"> Human readable (sometimes lengthy) description of the specific rule, to be applied to the whole input XML file </planpro:description> <planpro:comment xmlns="http://purl.oclc.org/dsdl/schematron"/> <planpro:test xmlns="http://purl.oclc.org/dsdl/schematron"> <planpro:success>human readable success message</planpro:success> <planpro:error>human readable error message</planpro:error> </planpro:test> <planpro:output xmlns="http://purl.oclc.org/dsdl/schematron">PlanPro object type</planpro:output> </svrl:text> </svrl:active-pattern> <svrl:fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/> <svrl:failed-assert test="false()" location="/*:PlanPro_Schnittstelle[namespace-uri()='http://www.plan-pro.org/modell/PlanPro/1.10.0.1'][1]/LST_Planung[1]/Fachdaten[1]/Ausgabe_Fachdaten[1]/LST_Zustand_Ziel[1]/Container[1]/Anhang[1]"> <svrl:text>Es ist ein Fehler aufgetreten.</svrl:text> <svrl:diagnostic-reference diagnostic="guid">317691e7-6b55-428d-925b-9107f72b9bc0</svrl:diagnostic-reference> <svrl:diagnostic-reference diagnostic="typ">Anhang</svrl:diagnostic-reference> <svrl:diagnostic-reference diagnostic="bereich">Betrachtung</svrl:diagnostic-reference> <svrl:diagnostic-reference diagnostic="aufbau">00</svrl:diagnostic-reference> <svrl:diagnostic-reference diagnostic="s1">Anhang</svrl:diagnostic-reference> <svrl:diagnostic-reference diagnostic="s2">file name</svrl:diagnostic-reference> <svrl:diagnostic-reference diagnostic="s3"/> <svrl:diagnostic-reference diagnostic="s4"/> <svrl:diagnostic-reference diagnostic="s5"/> <svrl:diagnostic-reference diagnostic="s6"/> <svrl:diagnostic-reference diagnostic="s7"/> </svrl:failed-assert> <svrl:fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/> <svrl:fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/> <svrl:fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/> <svrl:fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/> <svrl:fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/> <svrl:fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/> <svrl:fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/> <svrl:fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/> <svrl:failed-assert test="false()" location="/*:PlanPro_Schnittstelle[namespace-uri()='http://www.plan-pro.org/modell/PlanPro/1.10.0.1'][1]/LST_Planung[1]/Fachdaten[1]/Ausgabe_Fachdaten[1]/LST_Zustand_Ziel[1]/Container[1]/Aussenelementansteuerung[1]"> <svrl:text>Es ist ein Fehler aufgetreten.</svrl:text> <svrl:diagnostic-reference diagnostic="guid">bc2efe9a-a70b-4249-9c84-80636c08b093</svrl:diagnostic-reference> <svrl:diagnostic-reference diagnostic="typ">Aussenelementansteuerung</svrl:diagnostic-reference> <svrl:diagnostic-reference diagnostic="bereich">Betrachtung</svrl:diagnostic-reference> <svrl:diagnostic-reference diagnostic="aufbau">01</svrl:diagnostic-reference> <svrl:diagnostic-reference diagnostic="s1">Au\xDFenelementansteuerung</svrl:diagnostic-reference> <svrl:diagnostic-reference diagnostic="s2">Gleisfreimelde-Innenanlage</svrl:diagnostic-reference> <svrl:diagnostic-reference diagnostic="s3">AEA blah</svrl:diagnostic-reference> <svrl:diagnostic-reference diagnostic="s4"/> <svrl:diagnostic-reference diagnostic="s5"/> <svrl:diagnostic-reference diagnostic="s6"/> <svrl:diagnostic-reference diagnostic="s7"/> </svrl:failed-assert> </svrl:schematron-output> Desired Restructured Output (excerpt): <?xml version="1.0" encoding="UTF-8"?> <schematron-output xmlns="http://tu-dresden.de/vlp/schematron/hierarchical-svrl" title="Regelbasis f\xFCr PlanPro-PlaZ" schemaVersion="ISO19757-3"><!-- This is a restructured SVRL document, which does not comply with ISO 19757-3 Annex D grammar! --> <ns-prefix-in-attribute-values uri="http://www.plan-pro.org/regeln/struktur" prefix="planpro"/> <ns-prefix-in-attribute-values uri="http://tu-dresden.de/vlp/schematron/functions" prefix="fx"/> <ns-prefix-in-attribute-values uri="http://www.w3.org/2001/XMLSchema-instance" prefix="xsi"/> <active-pattern xmlns:planpro="http://www.plan-pro.org/regeln/struktur" document="file:/C:/Users/xyz/PlaZ/PlanPro-samples/Testdateien/Bezeichnertest2.xml" id="ID123" name="test rule" fpi="12345678-9ABC-DEF1-2345-6789ABCDEF12" see="test" planpro:workpackage="BASISOBJEKTE" planpro:version="1.10.0.1"> <text> <planpro:description> Human readable (sometimes lengthy) description of the specific rule, to be applied to the whole input XML file </planpro:description> <planpro:comment/> <planpro:test> <planpro:success>human readable success message</planpro:success> <planpro:error>human readable error message</planpro:error> </planpro:test> <planpro:output>PlanPro object type</planpro:output> </text> <fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"> <failed-assert test="false()" location="/*:PlanPro_Schnittstelle[namespace-uri()='http://www.plan-pro.org/modell/PlanPro/1.10.0.1'][1]/LST_Planung[1]/Fachdaten[1]/Ausgabe_Fachdaten[1]/LST_Zustand_Ziel[1]/Container[1]/Anhang[1]"> <text>Es ist ein Fehler aufgetreten.</text> <diagnostic-reference diagnostic="guid">317691e7-6b55-428d-925b-9107f72b9bc0</diagnostic-reference> <diagnostic-reference diagnostic="typ">Anhang</diagnostic-reference> <diagnostic-reference diagnostic="bereich">Betrachtung</diagnostic-reference> <diagnostic-reference diagnostic="aufbau">00</diagnostic-reference> <diagnostic-reference diagnostic="s1">Anhang</diagnostic-reference> <diagnostic-reference diagnostic="s2">file name</diagnostic-reference> <diagnostic-reference diagnostic="s3"/> <diagnostic-reference diagnostic="s4"/> <diagnostic-reference diagnostic="s5"/> <diagnostic-reference diagnostic="s6"/> <diagnostic-reference diagnostic="s7"/> </failed-assert> </fired-rule> <fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/> <fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/> <fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/> <fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/> <fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/> <fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/> <fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/> <fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"> <failed-assert test="false()" location="/*:PlanPro_Schnittstelle[namespace-uri()='http://www.plan-pro.org/modell/PlanPro/1.10.0.1'][1]/LST_Planung[1]/Fachdaten[1]/Ausgabe_Fachdaten[1]/LST_Zustand_Ziel[1]/Container[1]/Aussenelementansteuerung[1]"> <text>Es ist ein Fehler aufgetreten.</text> <diagnostic-reference diagnostic="guid">bc2efe9a-a70b-4249-9c84-80636c08b093</diagnostic-reference> <diagnostic-reference diagnostic="typ">Aussenelementansteuerung</diagnostic-reference> <diagnostic-reference diagnostic="bereich">Betrachtung</diagnostic-reference> <diagnostic-reference diagnostic="aufbau">01</diagnostic-reference> <diagnostic-reference diagnostic="s1">Au\xDFenelementansteuerung</diagnostic-reference> <diagnostic-reference diagnostic="s2">Gleisfreimelde-Innenanlage</diagnostic-reference> <diagnostic-reference diagnostic="s3">AEA blah</diagnostic-reference> <diagnostic-reference diagnostic="s4"/> <diagnostic-reference diagnostic="s5"/> <diagnostic-reference diagnostic="s6"/> <diagnostic-reference diagnostic="s7"/> </failed-assert> </fired-rule> </active-pattern> </schematron-output> In short: every fired-rule has to swallow its following failed-assert / successful-report siblings up to the next fired-rule, and every active-pattern has to swallow its following fired-rule group up to the next active-pattern. Is there an XSLT 2.0 way that produces identical output but avoids the apparent n^2 cost of the sibling-axis idioms above? Any pointer, code sketch, or "you are doing this wrong because..." is highly welcome. Thanks in advance, and best regards, Susanne
|

Cart



