[Home] [By Thread] [By Date] [Recent Entries]
At 2006-03-08 00:44 +0100, lists@xxxxxxxxxxxxx wrote:
I have a simple problem but I'm not sure whether xslt is the proper tool. For such a simple issue it might make sense not to incur the overhead of building the tree for thousands of files, so I would question the use of XSLT. Below is both an XSLT solution using a derivative of the identity transform, and a Python solution that buffers the group element and re-emits it with a modified attribute. Note that I have made a number of assumptions in the Python that may or may not apply in your actual situation instead of this test. The advantage of the Python implementation is speed: it is using the SAX streaming interface and is not incurring the overhead of building the input tree. This might help for your thousands of files. Note that switching to SAX from XSLT will also help if the input files are very large. For my UBL schema analysis work I had simple transforms for input XML files of 165Mb and rewriting my initial XSLT solution to Python/SAX improved performance to an acceptable amount (in one case it changed a one-hour invocation to less than a minute). I hope this helps. . . . . . . . Ken T:\ftemp>type bitfaeule.xml
<?xml version="1.0"?>
<filter name="ARMCheckTest">
<group section="mini">
</group>
<group section="basic">
<test name="testCheck1"></test>
<test name="testCheck2"></test>
</group>
</filter>T:\ftemp>xslt bitfaeule.xml bitfaeule.xsl con
<?xml version="1.0" encoding="utf-8"?><filter name="ARMCheckTest">
<group section="mini">
</group>
<group section="basic">
<test name="testCheck1"/>
<test name="testCheck2"/>
</group><group section="basic-alt">
<test name="testCheck1"/>
<test name="testCheck2"/>
</group>
</filter>
T:\ftemp>type bitfaeule.xsl
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"><xsl:template match="group[@section='basic']">
<xsl:copy-of select="."/>
<xsl:copy>
<xsl:attribute name="section">basic-alt</xsl:attribute>
<xsl:copy-of select="node()"/>
</xsl:copy>
</xsl:template><xsl:template match="@*|node()"><!--identity for all other nodes-->
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template></xsl:stylesheet>
T:\ftemp>type bitfaeule.xml
<?xml version="1.0"?>
<filter name="ARMCheckTest">
<group section="mini">
</group>
<group section="basic">
<test name="testCheck1"></test>
<test name="testCheck2"></test>
</group>
</filter>T:\ftemp>python bitfaeule.py <bitfaeule.xml
<?xml version="1.0" encoding="iso-8859-1"?>
<filter name="ARMCheckTest">
<group section="mini">
</group>
<group section="basic">
<test name="testCheck1"></test>
<test name="testCheck2"></test>
</group><group section="basic-alt">
<test name="testCheck1"></test>
<test name="testCheck2"></test>
</group>
</filter>
T:\ftemp>type bitfaeule.py
# A python program to capture and repeate generated XML syntaxfrom xml.sax import parse, SAXParseException from xml.sax.xmlreader import AttributesImpl from xml.sax.saxutils import XMLGenerator import sys import StringIO false = 0 true = not false # define a class that both buffers and outputs strings based on status # Note: this does not support nested elements being copied, only one at a time class copyOut:
def __init__(this, out):
if out is None:
out = sys.stdout
this._out = out # remember to whom writing is being done
this._buffer = false # start off with no buffering of writing
this._output = true # start off with all writing to output
this._store = "" # local store of the copy # an opportunity to change the direction of writing
def target( this, buffer, output ):
this._buffer = buffer
this._output = output # accommodate a writing request
def write( this, str ):
if this._buffer: # then buffer the string
this._store += str
if this._output: # then write out the string
this._out.write( str ) # the store of data is ready to be written
def flushStore( this ):
this._out.write( this._store )
this._store = "" # nothing need be rememberedclass myGenerator( XMLGenerator ):
def __init__(this, out=None, encoding="iso-8859-1"):
# take advantage of existing generator, but override output
this._copyOut = copyOut( out )
XMLGenerator.__init__(this, this._copyOut, encoding)def startElement( this, name, attrs): # put out the current element regardless XMLGenerator.startElement( this, name, attrs ) # determine if this is the element to be copied if name == "group": # might be it, check attributes for ( attr, value ) in attrs.items(): if ( attr, value ) == ( "section", "basic" ): # yes, this is the element to be copied, so buffer # modified start tag (assume only one attribute) this._copyOut.target( true, false ) XMLGenerator.startElement( this, name, AttributesImpl( { "section":"basic-alt" } ) ) # now buffer and write all content of element this._copyOut.target( true, true ) def endElement( this, name, ):
# put out the end of the current element regardless
XMLGenerator.endElement( this, name )
# determine if this is the element being copied
if name == "group":
# it may or may not be, but it won't hurt to flush empty buffer
this._copyOut.flushStore()
this._copyOut.target( false, true )gen = myGenerator() #============================================================================= # # Main logic try: # processing the input file using the defined SAX events
parse( sys.stdin, gen )
except IOError, (errno, strerror):
exit( "I/O error(%s): %s: %s" % (errno, strerror, file) )
except SAXParseException:
exit( "File does not parse as well-formed XML: %s" % file )sys.exit( ) # end of file
|

Cart



