[Home] [By Thread] [By Date] [Recent Entries]

  • To: Rich Salz <rsalz@d...>
  • Subject: Re: hashing
  • From: Eric Hanson <eric@a...>
  • Date: Wed, 5 May 2004 20:58:36 +0000
  • Cc: David Megginson <dmeggin@a...>,XML Developers List <xml-dev@l...>
  • In-reply-to: <Pine.LNX.4.44L0.0404292226350.4710-100000@s...>; from rsalz@d... on Thu, Apr 29, 2004 at 10:34:20PM -0400
  • References: <40916503.9080001@a...> <Pine.LNX.4.44L0.0404292226350.4710-100000@s...>
  • User-agent: Mutt/1.2i

I'm just concerned about being conceptually identical.
Instances might be rendered differently by different processors
but as long as they're conceptually the same that's the only
concern.  So running them through a canonicalization engine
works great for this. 

Anyway, thanks for the code, I gave it a try and it works great.

Eric

Rich Salz (rsalz@d...) wrote:
> If you're concerned about byte-for-byte identical, hashing each file
> is okay; if you're concerned about semantic identical (e.g., the order
> of attributes doesn't matter) than use standard XML canonicalization
> or something similar (but it won't be as good:)
> 
> Her's a portable python script that compares all files named on
> the command-line:
> 
> ; cat x.py
> import sys,sha
> from xml.dom.ext.reader import PyExpat
> from xml.dom.ext.c14n import Canonicalize
> 
> hashes = {}
> for f in sys.argv:
>     o = sha.sha()
>     if 1:
>         # simple hash of contents
>         o.update(open(f).read())
>     else:
>         # sha(c14n(doc))
>         r = PyExpat.Reader()
>         dom = r.fromStream(open(f))
>         o.update(Canonicalize(dom))
>     h = o.digest()
>     other = hashes.get(h, None)
>     if other:
>         print 'duplicate', f, other
>     else:
>         hashes[h] = f
> ;
> 
> --
> Rich Salz                  Chief Security Architect
> DataPower Technology       http://www.datapower.com
> XS40 XML Security Gateway  http://www.datapower.com/products/xs40.html
> XML Security Overview      http://www.datapower.com/xmldev/xmlsecurity.html
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member