Subject: Re: comparing nodesets to each other
From: "Kai Hackemesser" <kaha@xxxxxx>
Date: Mon, 11 Apr 2005 21:46:25 +0200 (MEST)
|
Hello, Aron,
I try to be more exact in my definition:
- two nodes 'relation' are different, if they have the same value in
relation/Attribute[@Name='FindNumber']/Value but the text value of both
node's children at all is different.
- a 'relation' node must be listed, too, if there is no corresponding
'relation' node with same relation/Attribute[@Name='FindNumber']/Value
- I need to know in which list a node is changed/added/removed.
- The whole list of changes needs to be sorted by the
Attribute[@Name='FindNumber']/Value
Regards, Kai
> Kai,
>
> IMO the general problem of finding the differences between any 2 XML
> documents is, shall we say, challenging. Something that helps such an
> operation is being extremely precise about what constitutes a difference,
> and being able to formulate precedence rules in comparision operations.
> An
> earlier respondent illustrated the need for this with an example that
> "added" a node in the second document. It's very likely *you* have a good
> idea of what you're after, but in these types of problems you'll get the
> most help if you can express your "rules for comparision" in [formal]
> written form.
>
> Consider the following documents:
>
> doc1.xml
> =======
> <doc>
> <chapter n="1"/>
> <chapter n="2"/>
> </doc>
>
> doc2.xml
> =======
> <doc>
> <chapter n="1"/>
> <chapter n="2">
> <para n="1"/>
> </chapter>
> </doc>
>
> What *exactly* would you like in your final output? Do you want to see
> only
> the node <para n="1"/>? Do you want to see <para n="1"/> and all its
> parent
> nodes? You see where this is going? It helps to be precise.
>
> Also, while writing a "general" differencing algorithm would be
> worthwhile,
> it's probably not simple. To start you'll have better luck if you
> constrain
> your problem, as it relates to your domain. One way to do this is by
> identifying a least granular level for your purposes--perhaps a node or
> "level" below which identifying differences is superfluous. In the
> example
> above, you could say:
>
> --chapter nodes are compared by their "n" attribute
> --if there are any differences betweein 2 <chapter> nodes or any of their
> descendents, the entire <chapter> node is considered "changed", and that
> of
> doc2.xml is output
>
> I've done this type of "constrained" comparision with success.
>
> Here's another approach to consider: preprocess each xml document to a
> "standard" format, then use a textual diff tool. The idea here is that
> you
> apply an XSL transform to doc1.xml so that <chapter> nodes are sequential,
> their descendents are ordered is a specific way, etc. Do the same with
> doc2.xml. Then use a diff tool ( eg: beyondcompare, from
> http://www.scootersoftware.com/ ) to check differences. Note, this method
> is susceptible to line-breaks, so it's not trivial to implement.
>
> Regards
>
> --A
>
>
>
> >From: "Kai Hackemesser" <kaha@xxxxxx>
> >Reply-To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> >To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> >Subject: Re: comparing nodesets to each other
> >Date: Mon, 11 Apr 2005 18:18:47 +0200 (MEST)
> >
> >Hello, David,
> >
> >Thanks for the response. The errors you mentioned already have happened,
> >that's why I'm currently clueless how to solve it.
> >
> >I try to show the structure of the recipe (eased):
> >
> ><object>
> > <relation>
> > <Attribute Type="string" Name="FindNumber">
> > <Value><![CDATA[0005]]></Value>
> > <Attribute>
> > <Attribute Type="float" Name="...
> > <object>
> > <Attribute Type="string" Name="PartNumber">
> > <Value><![CDATA[Part1]]></Value>
> > </Attribute>
> > </object>
> > </relation>
> > <relation>
> > <Attribute Type="string" Name="FindNumber">
> > <Value><![CDATA[0010]]></Value>
> > <Attribute>
> > <Attribute Type="float" Name="...
> > <object>
> > <Attribute Type="string" Name="PartNumber">
> > <Value><![CDATA[Part2]]></Value>
> > </Attribute>
> > </object>
> > </relation>
> > <relation>
> > <Attribute Type="string" Name="FindNumber">
> > <Value><![CDATA[0015]]></Value>
> > <Attribute>
> > <Attribute Type="float" Name="...
> > <object>
> > <Attribute Type="string" Name="PartNumber">
> > <Value><![CDATA[Part3]]></Value>
> > </Attribute>
> > </object>
> > </relation>
> ></object>
> >
> >needs to be compared against a similar structure:
> ><object>
> > <relation>
> > <Attribute Type="string" Name="FindNumber">
> > <Value><![CDATA[0005]]></Value>
> > <Attribute>
> > <Attribute Type="float" Name="...
> > <object>
> > <Attribute Type="string" Name="PartNumber">
> > <Value><![CDATA[Part1]]></Value>
> > </Attribute>
> > </object>
> > </relation>
> > <relation>
> > <Attribute Type="string" Name="FindNumber">
> > <Value><![CDATA[0015]]></Value>
> > <Attribute>
> > <Attribute Type="float" Name="...
> > <object>
> > <Attribute Type="string" Name="PartNumber">
> > <Value><![CDATA[Part3b]]></Value>
> > </Attribute>
> > </object>
> > </relation>
> ></object>
> >
> >(Attribute nodes are more than one per object or relation node)
> >
> >So I need to extract all differences like attribute change, missing
> nodes,
> >altered nodes, added nodes. To identify a node I use the findnumber
> >Attribute node of each relation node. A missing node is one, where the
> >corresponding Findnumber Attribute value is missing in nodelist 'b'. An
> >added node is one where the corresponding Findnumber Attribute value is
> >missing in nodelist 'a'. An altered node means the Findnumber Attribute
> >value is there in bothe nodelists, but the Attribute nodes or the
> >object/Attribute nodes are different. I think a simple text compare would
> >be
> >enough for the test of alternation.
> >
> >Regards,
> >Kai
> >
>
> _________________________________________________________________
> Dont just search. Find. Check out the new MSN Search!
> http://search.msn.click-url.com/go/onm00200636ave/direct/01/
|