Greg Hudson wrote:
> Third, he clued me in on an interesting hashing trick used in their
> vdelta implementation: vdelta allocates an array of hash table entries
> equal to the length of the string being compressed.  The entry used
> for a given index into the string is just &array[index]; thus, the
> index can be implicit, saving four bytes in the hash entries.  Thus,
> the hash table looks a bit like:
> 
>         struct hashslot { struct hashslot *next; };
>         struct hashslot *slots[2 * WINDOWSIZE];
>         struct hashslot *hashtab[HASHSIZE];
> 
> to insert a sequence -> position mapping into the hash table:
> 
>         hval = hash(sequence);
>         slots[ind].next = hashtab[hval];
>         hashtab[hval] = &slots[ind];
> 
> and you'd traverse the list of match candidates (some of which may be
> hash collisions) for a given sequence like:
> 
>         for (slot = hashtab[hash(sequence)]; slot != NULL; slot = slot->next) {
>           ind = slot - slots;
>           ...
>         }
> 
> Of course, there are other ways to manage the data structure and the
> memory (you could use array positions into the slots table instead of
> pointers, for instance; if the window size is 32K or less, you could
> even use shorts to save memory); that was just the most compact way of
> expressing it.
This is a very nice trick. I think I'll go ahead and use it.
 
> Fourth, he said that he was finishing up an open-source library for
> data transformations called Vcode, which would contain a Vdelta
> generator with vcdiff output.  This is good news; even if the license
> is bogus somehow, it gives us something to refer to (e.g. for settling
> the second point).
Does that mean we should wait for that library, or finish our own
implementation?
    Brane
-- 
Branko Čibej                 <branko.cibej@hermes.si>
HERMES SoftLab, Litijska 51, 1000 Ljubljana, Slovenia
voice: (+386 1) 586 53 49     fax: (+386 1) 586 52 70
Received on Sat Oct 21 14:36:07 2006