Re: fs-rep-sharing branch

From: Alan Barrett <apb_at_cequrux.com>
Date: Wed, 22 Oct 2008 20:01:43 +0200

On Wed, 22 Oct 2008, Greg Stein wrote:
> We *still* have all the problems that md5 is fully-intertwined in our
> code. I'm still not willing to do double-checksums and kill millions
> of coders for a few researchers who could simply tar their candidate
> pairs together, or gzip them. Yes, that's the brutal truth :-P ... the
> researchers need to use workarounds, and the millions get a fast
> product.

Would it be possible to detect collisions and use a different index key
instead? By "index" I mean "whatever you use to map from short keys
(e.g. MD5 hashes) to actual stored content". Perhaps something like
this:

    calculate hash of content;
    if (hash does not exist as a key in the index) {
        store content indexed by the hash;
    } else if (index key refers to content that really is identical) {
        re-use that index key;
    } else {
        do something clever to deal with the hash collision;
    }

"Do something clever" could involve choosing a different index key based
on both the content hash and a collision serial number, incrementing the
serial number until previously-stored identical content is found, or
until the key is not found in the index.

--apb (Alan Barrett)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-10-22 20:05:34 CEST

This message: [ Message body ]
Next message: D.J. Heap: "Re: issue-2382 -- question about winservice_svnserve_accept_socket"
Previous message: Neels J Hofmeyr: "Re: Branching 1.6 on Nov. 5"
In reply to: Greg Stein: "Re: fs-rep-sharing branch"
Next in thread: David Glasser: "Re: fs-rep-sharing branch"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]