[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn binary deltas: can they be tuned or sth?..

From: Malcolm Rowe <malcolm-svn-dev_at_farside.org.uk>
Date: 2006-02-10 12:38:11 CET

On Thu, Feb 09, 2006 at 11:06:12PM +0100, Molle Bestefich wrote:
> Malcolm Rowe wrote:
> > Is the file ordering the same in each tarfile?
>
> Why wouldn't they wouldn't be?
> It's tar'ing up an ext3 filesystem - would the order change between two reads?
> (I tried to find a sort option on --tar, but seems there is none..)
>

It's probably not that, though you could check fairly easily by diffing
the output of 'tar ztf'. In general, there's no reason that the
ordering of files in the tar remains the same, though it probably has.
As Daniel mentioned, you'd be better off making sure that the new (i.e.,
not previously existing) files go to the end of the archive (no, I've
no idea how to achieve that, I'm afraid) - since that will maximise the
commonality between the two archives.

> > Did emerge do anything silly like update the mtime for each file?
>
> Heh, probably :-).
> It does an rsync of about 60.000 files.

That doesn't sound offhand like it would affect the mtime. Though if
it did, note that the tar files would differ at every file header block
(which means throughout the file), and since you're gzipping (with
--rsyncable), the compressed data would differ from that point to the
end of the gzip 'rsync window', which means _lots_ of small changes.

> > At what points is gzip resync'ing? It's possible that there isn't as
> > much commonality between the files as you'd expect.
>
> Not sure I understand the question.
> [...]
> which has a # define RSYNC_WIN 4096, is that what you mean?
>

Well, as I understand it, 'gzip --rsyncable' just gets gzip to discard
the compression model at regular intervals, meaning that a change in a
file will only propagate as far as the next discard point. I'm not sure
how that affects additions or deletions, but presumably there's some
reason that they don't cause avalanches throughout the whole of the file.

Note that the --rsyncable idea is _significantly_ better if it is
controlled by the application producing the output file, rather than
just being fired at regular intervals. For example, 'tar' could
choose to discard the current model at the start of each file header,
or something similar.

> Maybe Gentoo's gzip is just defective wrt. --rsyncable?
>

Dunno, you'd have to test it. This seems to be moving off-topic though,
so unless there's a demonstrable problem with Subversion's diff algorithm
per se, it's probably best to follow up on a Gentoo-specific list, where
presumably people have practical experience with using gzip --rsyncable.

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Feb 10 12:39:06 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.