Sat, Jan 11, 2003 at 12:06:36AM +0100, Sander Striker wrote:
> Ofcourse, while working on this I stumbled over the
> current implementation of diff... *blush* I remember
> now that to get something working I used md5 hashes
> of each line... No wonder we perform bad on 6MB files
> (still only 3.5 seconds with a _lot_ of differences
> and matches).
>
Would Adler-32 be more appropriate?
/*
* An Adler-32 implementation per RFC1950.
*
* "The Adler-32 algorithm is much faster than the CRC32 algorithm yet
* still provides an extremely low probability of undetected errors"
*/
#include <stdio.h>
#include <sys/types.h>
/*
* 65521 is the largest prime less than 65536.
* "That 65521 is prime is important to avoid a possible large class of
* two-byte errors that leave the check unchanged."
*/
#define MOD_BASE 65521
/*
* "The modulo on unsigned long accumulators can be delayed for 5552 bytes,
* so the modulo operation time is negligible."
*/
#define MOD_BLOCK_SIZE 5552
u_int32_t
adler32_update(u_int32_t checksum,
const unsigned char *input,
size_t length)
{
u_int32_t s1 = checksum & 0xFFFF;
u_int32_t s2 = checksum >> 16;
size_t blocks = length / MOD_BLOCK_SIZE;
length %= MOD_BLOCK_SIZE;
while (blocks--)
{
int count = MOD_BLOCK_SIZE;
while (count--)
{
u_int32_t b = *input++;
s1 += b;
s2 += s1;
}
s1 %= MOD_BASE;
s2 %= MOD_BASE;
}
while (length--)
{
u_int32_t b = *input++;
s1 += b;
s2 += s1;
}
return ((s2 % MOD_BASE) << 16) |
(s1 % MOD_BASE);
}
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Jan 12 17:39:25 2003