[Home] [By Thread] [By Date] [Recent Entries]


md5sum is a cryptographic hash using the MD5 algorithm.  It's not fast, but
it will do what you want.  It's available in linux, in cygwin, and probably
other ways.

In a reasonable command shell, where unix commands are available along with
md5sum,

md5sum *.xml | sort

will put the duplicate files on neighboring lines.

Jeff

----- Original Message ----- 
From: "Eric Hanson" <eric@a...>
To: <xml-dev@l...>
Sent: Thursday, April 29, 2004 12:58 PM
Subject:  hashing


> I have a large collection of XML documents, and want to find and
> group any duplicates.  The obvious but slow way of doing this is
> to just compare them all to each other.  Is there a better
> approach?
>
> Particularly, is there any APIs or standards for "hashing" a
> document so that duplicates could be identified in a similar way
> to what you'd do with a hash table?


  • References:
    • hashing
      • From: Eric Hanson <eric@a...>
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member