Re: Reality of delta benefit

From: Zack Weinberg <zack_at_wolery.cumb.org>
Date: 2000-08-26 00:55:47 CEST

On Fri, Aug 25, 2000 at 04:58:19PM -0500, Jim Blandy wrote:
>
> > At the risk of broken-record-ness, the SCCS weave format is
> > consistently smaller than RCS format for the same set of versions, and
> > takes the same time to extract any revision.
>
> In the original RCS paper, they explain why they think RCS is superior
> to SCCS. Essentially, the time to extract any revision from SCCS is
> proportional to the total number of pieces in the SCCS weave, whereas
> the time to extract a revision from RCS depends only on how distant
> that version is from the trunk --- not on the total number of
> revisions present. Since most accesses are to recent versions, this
> is the right tradeoff.

Only in the absence of branches. The time to extract a revision in
RCS is proportional to the distance from the HEAD revision, not just
the trunk. To get the tip of a branch, you have to execute reverse
diffs back to the branchpoint, then forward diffs from the branchpoint
all the way to the tip of branch. Creating a branch involves
rewriting most of the body.

SCCS in contrast can extract the tip of any branch in the same time as
it takes to extract tip of trunk. (Which is the same time it takes to
extract any revision at all.) It can create a branch at any position
by adding one metadata record, and do a trivial merge between any two
revisions (one for which rcsmerge would generate no conflicts) by
adding one metadata record.

If you choose an implementation that gives you dirt cheap branches,
you discover all sorts of neat things you can do with them. For
example, the equivalent of a 'cvs update' operation in Bitkeeper, when
you have local changes, looks like this:

0) You have a tree like so:

1.1 -- 1.2 -- 1.3 -- 1.4 (head)

1) Commit your own patches to your private copy of the s.files.

1.1 -- 1.2 -- 1.3 -- 1.4 -- 1.5

2) Push those patches onto a new branch.

1.1 -- 1.2 -- 1.3 -- 1.4
+--- 1.4.1.1

3) Acquire the new 1.5 revision.

1.1 -- 1.2 -- 1.3 -- 1.4 -- 1.5
\ -- 1.4.1.1

4) Merge 1.4.1.1 back into 1.5, producing 1.6.

1.1 -- 1.2 -- 1.3 -- 1.4 -- 1.5 -- 1.6
\ -- 1.4.1.1 -- /

This has a huge advantages over the 'mangle everything together in the
working files' approach CVS takes: your original changes are still
available as revision 1.4.1.1. If you ever need to go back and
disentangle your changes from my changes from Ed's changes, you have
all the information you need to do it. If your patch is not
immediately accepted to the main repository, you just keep going; the
next time you update, the "1.6" revision will get pushed onto the the
1.4.1.x branch.

1.1 -- 1.2 -- 1.3 -- 1.4 -- 1.5 -- \ -- 1.6 -- 1.7 ...
\ -- 1.4.1.1 -- 1.4.1.2 -- 1.4.1.3 ...

And you never have to do that chunk of the merge again, it
remembers. (Just like we say SVN will, on the webpage, for
inter-repository merges.)

Or if the changes in 1.5 turned out to be broken, you can go back to
1.4.1.1 and keep working from there until the main tree is functional
again.

  1.1 -- 1.2 -- 1.3 -- 1.4 -- 1.5 --- 1.6 -- 1.7 ...
                        | \---------1.5.1.1 (busted)
                        \ -- 1.4.1.1 --- / --- 1.4.1.2 -- 1.4.1.3 ...

I might also add that implementation details of both RCS and SCCS
swamp the intrinsic performance characteristics of the file formats.
Theoretically, RCS's complexity is

   RCS(HEAD) = O(number of lines in HEAD)
   RCS(rev) = O(RCS(rev+1) + PATCH(rev+1, rev))
   PATCH(a, b) >= O(max(lines in A, lines in B)).

So RCS winds up being O(m*n) where m is the distance to HEAD, and n is
the number of lines in the longest revision between the one you want
and the HEAD. SCCS is just

SCCS(rev) = O(total number of lines in s.file)

Ideal implementations of RCS would therefore be faster than SCCS for
very short distances from the HEAD.

In practice, RCS reads the entire ,v file whether it needs to or not,
which means it is never faster than a good implementation of SCCS.
However, the SCCS that comes with SysV Unix has got a huge constant
factor on its operations. So RCS can *appear* to be faster than SCCS
for the same task.

zw
Received on Sat Oct 21 14:36:07 2006

This message: [ Message body ]
Next message: Jim Blandy: "Re: Reality of delta benefit"
Previous message: Jim Blandy: "Re: CVS update: subversion/subversion/include svn_delta.h"
In reply to: Jim Blandy: "Re: Reality of delta benefit"
Next in thread: Jim Blandy: "Re: Reality of delta benefit"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]