Re: CPU usage during commits

From: John Szakmeister <john_at_szakmeister.net>
Date: 2006-05-25 10:12:41 CEST

----- Simon Butler <simon@icmethods.com> wrote:
[snip]
> can anyone explain why we need a binary diff at the repos? what is
> the algorithm for determining which previous revision is diffed for
> the FSFS repos delta? it seems like this is redundant

The binary diff is there to help reduce storage requirements. The algorithm used to determine which revision is diffed against is called skip-deltas. Documentation about it can be found here:
http://svn.collab.net/repos/svn/trunk/notes/skip-deltas

Assuming that I understand correctly, you believe it's redundant because we're computing a delta to send over the wire, and then another one for storage into the repos. The reason there are two is that sending a delta over the wire is more WAN friendly. The reason that we don't use those deltas to store into the repos comes down to algorithmic efficiency. If we used the deltas the client sent, then we would need to combine every single revision of a file in order to produce the full-text. That means the more revisions you have, the more deltas that need to be combined, and it grows in a linear fashion. OTOH, using skip-deltas, the Subversion team has been able to limit that to O(lg(N)) behavior, which is much more favorable when you start talking about repositories with hundreds of thousands of revisions (think GCC, and ASF repositories).

> >
> > I understand that Subversion has a good binary delta algorithm and
>
> > that it is used for all files, text or binary, and that
> > improvements in the algorithm are certainly welcomed.
> >
>
> reducing redundant delta calculation would be the first step here.

-John

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Thu May 25 10:14:02 2006

This message: [ Message body ]
Next message: John Szakmeister: "Re: CPU usage during commits"
Previous message: Alex Le Dain: "TortoiseSVN Crashes SvnServe"
Maybe in reply to: Sinang, Danny: "CPU usage during commits"
Next in thread: John Szakmeister: "Re: CPU usage during commits"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]