[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Skip-deltas, for review

From: William Uther <will+_at_cs.cmu.edu>
Date: 2002-07-27 06:42:04 CEST

On 26/7/02 11:59 PM, "Greg Hudson" <ghudson@MIT.EDU> wrote:

> (Replies to several messages here.)
>
> Branko and Glenn are making a mountain out of a molehill on commit
> costs. The average number of redeltifications is 1.75, up from 1. The
> largest number is log(N), but you get averaging over files as well as
> over commits (unless you commit a lot of stuff precisely in sync with
> each other).
>
> And redeltifications are only one of several things which go on during a
> commit. As I observed in my timing tests, even without the noise of WC
> overhead (i.e. using "svn load"), commit time only goes up by about 5%
> with our repository, and no higher than about 17% in the extreme case
> where we get massive benefits.
>
> This just isn't going to be a problem.

*points* What he said. :)

>> William Uther wrote:
>>> Note that I've moved the "prevnode = node" inside the outer loop.
>
> But "count" is not reset either. So we do 1 2 4 8 16 32 back. I'll add
> a comment.

Comment would be goodness.

>>> Remember that the average slowdown in commit speed is O(log(log(N))
>>> - we do log N re-deltifications every log N steps. That's REALLY
>>> small.
>
> No, it's a constant factor.

Not sure I believe you. But either way it is small.

>>> One strange thing: I though skip-lists would cost more in terms of
>>> space. I was thinking about a factor of 2 (hmm - or should that be N
>>> logN?). Any thoughts on why you aren't seeing this?
>
> * This isn't a skiplist, since the only goal is to get from any given
> node-revision to the head (or other plaintext). We only have to keep
> the longest arrow pointing out of a node-revision.

Yes. The picture I have in my head is:

http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgId=162454

I still don't believe your constant factor. In each row you have X arrows
of length Y for a total of X*Y arrow length per row:

Row X Y X*Y

1 N/2 1 N/2
2 N/4 2 N/2
3 N/8 4 N/2
4 N/16 8 N/2

i N2^-i 2^(i-1) N/2

The total length of arrow is the sum of the rows. There are log(N) rows,
each with N/2 arrow length, giving O(N log(N)) total arrow length. That's a
factor of log(N) more, not constant.

Anyway, the exact analysis of the algorithm isn't that important :). It is
much better than the other suggestions (deltify against head, or
'checkpoint' revisions).

> * A lot of files stabilize by the time they reach 32 node-revisions,
> and those files experience no cost.

This is great point! I think this is the real reason you aren't seeing
larger repository size. This and the fact that the re-deltification of two
diffs can be shorter than the straight concatenation of the diffs.

>> However, I'd like to point out that there's a simpler way than skip
>> deltas to speed up access to old versions, and keep commit times
>> reasonably fast. Simply store every N'th revision of a file as a
>> self-referent delta -- actually, a vdelta-compressed fulltext.
>
> This is not as space-efficient as skip-deltas (by order of growth).
> It's particularly bad for growing files like ChangeLogs.

Yeah. I don't like this idea. Go read the thread I refer to above for more
discussion of this and why it is bad. I think you need to look at posts
before the one I have referred to above.

Later,

\x/ill :-}

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jul 27 06:43:06 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.