Greg Hudson wrote:
> Third, he clued me in on an interesting hashing trick used in their
> vdelta implementation: vdelta allocates an array of hash table entries
> equal to the length of the string being compressed. The entry used
> for a given index into the string is just &array[index]; thus, the
> index can be implicit, saving four bytes in the hash entries. Thus,
> the hash table looks a bit like:
>
> struct hashslot { struct hashslot *next; };
> struct hashslot *slots[2 * WINDOWSIZE];
> struct hashslot *hashtab[HASHSIZE];
>
> to insert a sequence -> position mapping into the hash table:
>
> hval = hash(sequence);
> slots[ind].next = hashtab[hval];
> hashtab[hval] = &slots[ind];
>
> and you'd traverse the list of match candidates (some of which may be
> hash collisions) for a given sequence like:
>
> for (slot = hashtab[hash(sequence)]; slot != NULL; slot = slot->next) {
> ind = slot - slots;
> ...
> }
>
> Of course, there are other ways to manage the data structure and the
> memory (you could use array positions into the slots table instead of
> pointers, for instance; if the window size is 32K or less, you could
> even use shorts to save memory); that was just the most compact way of
> expressing it.
This is a very nice trick. I think I'll go ahead and use it.
> Fourth, he said that he was finishing up an open-source library for
> data transformations called Vcode, which would contain a Vdelta
> generator with vcdiff output. This is good news; even if the license
> is bogus somehow, it gives us something to refer to (e.g. for settling
> the second point).
Does that mean we should wait for that library, or finish our own
implementation?
Brane
--
Branko Čibej <branko.cibej@hermes.si>
HERMES SoftLab, Litijska 51, 1000 Ljubljana, Slovenia
voice: (+386 1) 586 53 49 fax: (+386 1) 586 52 70
Received on Sat Oct 21 14:36:07 2006