[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: fs-rep-sharing branch

From: Alan Barrett <apb_at_cequrux.com>
Date: Wed, 22 Oct 2008 20:01:43 +0200

On Wed, 22 Oct 2008, Greg Stein wrote:
> We *still* have all the problems that md5 is fully-intertwined in our
> code. I'm still not willing to do double-checksums and kill millions
> of coders for a few researchers who could simply tar their candidate
> pairs together, or gzip them. Yes, that's the brutal truth :-P ... the
> researchers need to use workarounds, and the millions get a fast
> product.

Would it be possible to detect collisions and use a different index key
instead? By "index" I mean "whatever you use to map from short keys
(e.g. MD5 hashes) to actual stored content". Perhaps something like
this:

    calculate hash of content;
    if (hash does not exist as a key in the index) {
        store content indexed by the hash;
    } else if (index key refers to content that really is identical) {
        re-use that index key;
    } else {
        do something clever to deal with the hash collision;
    }

"Do something clever" could involve choosing a different index key based
on both the content hash and a collision serial number, incrementing the
serial number until previously-stored identical content is found, or
until the key is not found in the index.

--apb (Alan Barrett)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-10-22 20:05:34 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.