Subject: RE: String hashing code
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 14 Dec 2007 09:43:22 -0000
|
It sounds as if you want a result that is ASCII, that is of modest length,
and that has a high probability of being unique without offering a
guarantee.
You could do the equivalent of
string(sum(for $c at $p in string-to-codepoints(document-uri(/)) return
$c*$p))
(the equivalent in XSLT is a bit more longwinded because of the lack of "at
$p")
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Deborah Pickett [mailto:debbiep-list-xsl@xxxxxxxxxx]
> Sent: 14 December 2007 07:36
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: String hashing code
>
> A challenge to the XSLT demigods...
>
> I am processing a number of separate XML documents using an
> Ant <xslt> task, pulling out the MathML that is embedded
> inside them into their own XML files using
> xsl:result-document (where I render them using Batik).
> I want to make sure that the result document names don't
> clash, but because they are across several source files,
> generate-id() isn't going to suffice. There are thousands of
> source files, all with English-sounding names spread across
> many directories.
>
> I was thinking of hashing document-uri(/) to produce a
> probably-unique string that I can then append generate-id(.)
> to. I rejected
> encode-for-uri() as producing strings that are too long, and
> for not anonymizing the document uri enough. All the hashing
> algorithms I know (MD5, for instance) happen to be heavy on
> bitwise operations, and I feel dirty doing bitwise operations
> with arithmetic.
>
> I prefer not to escape to non-XSLT, because I am providing
> this as part of a library that needs to run on almost any
> XSLT 2.0 platform.
>
> Any clever ideas?
|