[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH]: Implement xdelta, use it by default

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2005-02-12 01:22:46 CET

Daniel Berlin <dberlin@dberlin.org> writes:

Just a few quick comments on the implementation:

> +/* This is pseudo-adler32, and xdelta, as monotone does it. The
> + adler32 has no prime modulus. This is a direct translation of the
> + C++ code, hence the reason it looks a bit weird. */

monotone is GPL isn't it, can a "direct translation" be released under
the Subversion licence? I think Adler32 is widely used under a
variety of licences, not so sure about xdelta.

> +#define adler32_mask 0x0000ffff
> +#define adler32_char_mask 0x000000ff
> +
> +/* Structure to store the state of our adler32 checksum. */
> +struct adler32
> +{
> + apr_uint32_t s1;
> + apr_uint32_t s2;
> + apr_uint32_t len;
> + apr_uint32_t mask;
> +};
> +
> +/* Feed C into the adler32 checksum. */
> +
> +static inline void

We sort of target C89 and don't use inline, how about using APR_INLINE?

> +adler32_in (struct adler32 *ad, const char c)
> +{
> + ad->s1 += ((apr_uint32_t) (c)) & adler32_char_mask;
> + ad->s1 &= adler32_mask;
> + ad->s2 += ad->s1;
> + ad->s2 &= adler32_mask;
> + ad->len++;
> +}
> +

> +static void
> +find_match (apr_hash_t *matches,
> + const struct adler32 *rolling,
> + const char *a,
> + apr_uint32_t asize,
> + const char *b,
> + apr_uint32_t bsize,
> + apr_uint32_t bpos,
> + apr_uint32_t *aposp,
> + apr_uint32_t *alenp,
> + apr_uint32_t *badvancep,
> + svn_stringbuf_t **pending_insert)
> +{
> + apr_uint32_t sum = adler32_sum (rolling);
> + apr_uint32_t alen, badvance, apos;
> + struct match *match;
> + apr_uint32_t tpos, tlen;
> +
> + /* See if we have a match. */
> + match = apr_hash_get (matches, &sum, sizeof (sum));
> + if (match == NULL)
> + return;
> +
> + /* See where our match started. */
> + tpos = match->pos;
> + tlen = match->len;
> +
> + /* Make sure it's not a false match. */
> + if (memcmp (a + tpos, b + bpos, tlen) != 0)
> + return;
> +
> + apos = tpos;
> + alen = tlen;
> + badvance = tlen;
> +
> + /* Extend the match forward as far as possible */
> + while ((apos + alen >= 0)
> + && (bpos + badvance >= 0)

../svn/subversion/libsvn_delta/vdelta.c: In function 'find_match':
../svn/subversion/libsvn_delta/vdelta.c:452: warning: comparison of unsigned expression >= 0 is always true
../svn/subversion/libsvn_delta/vdelta.c:453: warning: comparison of unsigned expression >= 0 is always true

> + && (apos + alen < asize)
> + && (bpos + badvance < bsize)
> + && (a[apos + alen] == b[bpos + badvance]))
> + {
> + ++alen;
> + ++badvance;
> + }
> +
> + /* See if we can extend backwards into a previous insert hunk. */
> + if (*pending_insert)
> + {
> + while (apos > 0
> + && bpos > 0
> + && a[apos - 1] == b[bpos - 1]
> + && (*pending_insert)->len != 0)
> + {
> + svn_stringbuf_chop (*pending_insert, 1);
> + --apos;
> + --bpos;
> + ++alen;
> + }
> + /* If we completely consumed the entire insert, delete it. */
> + if ((*pending_insert)->len == 0)
> + *pending_insert = NULL;
> + }
> +
> + *aposp = apos;
> + *alenp = alen;
> + *badvancep = badvance;
> +}
> +
> +/* Size of the blocks we compute checksums for. */
> +#define MATCH_BLOCKSIZE 64

Add some rationale for 64 please, even "64 is a guess" or an explicit
"64 is what monotone uses" would do.

> +
> +
> +/* Compute a delta from A to B using xdelta.
> +
> +The basic xdelta algorithm is as follows:
> +
> +1. Go through the source data, checksumming every BLOCKSIZE block of bytes

MATCH_BLOCKSIZE

> + using adler32, and inserting the checksum into a match table with the
> + position of the match.
> +2. Go through the target byte by byte, seeing if that byte starts a match that
> + we have in the match table.
> + 2a. If so, try to extend the match as far as possible both forwards and
> + backwards, and then insert a source copy operation into the delta ops
> + builder for the match.
> + 2b. If not, insert the byte as new data using an insert delta op.
> +
> +Our implementation doesn't immediately insert "insert" operations, it waits
> +until we have another copy, or we are done. The reasoning is twofold:
> +
> +1. Otherwise, we would just be building a ton of 1 byte insert operations
> +2. So that we can extend a source match backwards into a pending insert
> +operation, and possibly remove the need for the insert entirely. This can
> +happen due to stream alignment.

-- 
Philip Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Feb 12 01:24:00 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.