On 26/7/02 11:59 PM, "Greg Hudson" <ghudson@MIT.EDU> wrote:
> (Replies to several messages here.)
> 
> Branko and Glenn are making a mountain out of a molehill on commit
> costs.  The average number of redeltifications is 1.75, up from 1.  The
> largest number is log(N), but you get averaging over files as well as
> over commits (unless you commit a lot of stuff precisely in sync with
> each other).
>
> And redeltifications are only one of several things which go on during a
> commit.  As I observed in my timing tests, even without the noise of WC
> overhead (i.e. using "svn load"), commit time only goes up by about 5%
> with our repository, and no higher than about 17% in the extreme case
> where we get massive benefits.
> 
> This just isn't going to be a problem.
*points* What he said. :)
>> William Uther wrote:
>>> Note that I've moved the "prevnode = node" inside the outer loop.
> 
> But "count" is not reset either.  So we do 1 2 4 8 16 32 back.  I'll add
> a comment.
Comment would be goodness.
>>> Remember that the average slowdown in commit speed is O(log(log(N))
>>> - we do log N re-deltifications every log N steps.  That's REALLY
>>> small.
> 
> No, it's a constant factor.
Not sure I believe you.  But either way it is small.
>>> One strange thing: I though skip-lists would cost more in terms of
>>> space.  I was thinking about a factor of 2 (hmm - or should that be N
>>> logN?).  Any thoughts on why you aren't seeing this?
> 
> * This isn't a skiplist, since the only goal is to get from any given
> node-revision to the head (or other plaintext).  We only have to keep
> the longest arrow pointing out of a node-revision.
Yes.  The picture I have in my head is:
http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgId=162454
I still don't believe your constant factor.  In each row you have X arrows
of length Y for a total of X*Y arrow length per row:
Row    X    Y              X*Y
1     N/2   1             N/2
2     N/4   2             N/2
3     N/8   4             N/2
4     N/16  8             N/2
i     N2^-i  2^(i-1)      N/2
The total length of arrow is the sum of the rows.  There are log(N) rows,
each with N/2 arrow length, giving O(N log(N)) total arrow length.  That's a
factor of log(N) more, not constant.
Anyway, the exact analysis of the algorithm isn't that important :).  It is
much better than the other suggestions (deltify against head, or
'checkpoint' revisions).
> * A lot of files stabilize by the time they reach 32 node-revisions,
> and those files experience no cost.
This is great point!  I think this is the real reason you aren't seeing
larger repository size.  This and the fact that the re-deltification of two
diffs can be shorter than the straight concatenation of the diffs.
>> However, I'd like to point out that there's a simpler way than skip
>> deltas to speed up access to old versions, and keep commit times
>> reasonably fast. Simply store every N'th revision of a file as a
>> self-referent delta -- actually, a vdelta-compressed fulltext.
> 
> This is not as space-efficient as skip-deltas (by order of growth).
> It's particularly bad for growing files like ChangeLogs.
Yeah.  I don't like this idea.  Go read the thread I refer to above for more
discussion of this and why it is bad.  I think you need to look at posts
before the one I have referred to above.
Later,
\x/ill        :-}
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jul 27 06:43:06 2002