I have to say first that I'm not really familiar with the way svn handles
deltification now - so please be patient and just say when I'm telling stupid
> > There are various proposed solutions in the issue. But for now, I'd
> > like to talk just about solutions we can implement before 1.0 (i.e.,
> > before Beta, i.e., before 0.33 :-) ). The two that seem most
> > realistic are:
> > 1. Prevent deltification on files over a certain size, but create
> > some sort of out-of-band compression command -- something like
> > 'svnadmin deltify/compress/whatever' that a sysadmin or cron job
> > can run during non-peak hours to reclaim disk space.
> > 2. Make svn_fs_merge() spawn a deltification thread (using APR
> > threads) and return success immediately. If the thread fails to
> > deltify, it's not the end of the world: we simply don't get the
> > disk-space savings.
> 3. Never do deltification of any sort in the filesystem code, and
> create an out-of-band compression command that can be run as a
> post-commit hook.
Another solution, which may not be done by 0.33, would be the following:
If we trust that there'll be no hash-collisions (in SHA or MD5 or whatever -
which may not hold true ) then we'll just save the hash of blocks of data.
The boundaries are determined by having a rolling CRC (see also  ), and a
boundary is where eg. the last 14bits of the crc are zero.
So we'll get a (data-based) list of (crc, hash, start, length) blocks, which
we then compare against the "new" file.
In my upcoming perl-module "Digest::Manber" I take another value as well - the
crc prior to the boundary.
So we would have eg 128bit hash, 32bit CRC, and length information to compare
for each block, which should make synchronisation faster - we don't have to
compare two full files against each other, but can take a list (probably
sorted by hash).
I don't exactly know what is implemented today - but maybe that would make
deltification faster (at the expense of harddisk space, of course).
: "An analysis of compare-by-hash" http://www.nmt.edu/~val/review/hash.pdf
: "Finding Similar Files in a Large File System"
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com
Received on Thu Nov 6 08:35:06 2003