[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: 'svnadmin load' doesn't deltify enough.

From: John Aldridge <jpsa_at_jjdash.demon.co.uk>
Date: 2004-04-16 20:40:31 CEST

In message <85zn9c9gkh.fsf@newton.ch.collab.net>, kfogel@collab.net
writes
>Let's start from the beginning:

  (snip very helpful explanation of subversion deltification)

>If someone now commits to /trunk/bar/blah.txt to create r4, *then* the
>tip of trunk and the tip of branch will both have fulltexts, because
>starting in r3, the two blah.txt files were no longer sharing storage.

  :

>It doesn't seem likely to me that the extra fulltexts on branch tips
>could account for the kinds of storage size differences we're seeing
>here, anyway. I mean, yeah, if you create a lot of branches, and make
>commits to many different files on each branch (as opposed to many
>commits on a few files), then yes, it could affect total storage by
>these amounts.

  :

And, in message <85n05c6ig0.fsf@newton.ch.collab.net>
>How does being a former RCS repository imply that every file has a
>commit on every branch? Shouldn't it only have a commit if the file
>was modified since being branched?

Let me explain the hole we've dug ourselves into here, in the hope that
someone can suggest something...

Development occurs on the RCS trunk. When we come to release time (say
version 6.0) then, for every file in the repository, we drop a label on
the tip of the trunk...

   rcs -nV60: *

We set up a branch label starting at that point in case we need to issue
any patches...

   rcs -nV60X:V60.60 *

And we force a revision onto that branch...

   co -rV60 *
   ci -rV60X -m"V6.0.* development branch" -f *

Before continuing normal development on the mainline.

To be specific, supposing (for a particular file) version 6.0 used
revision 1.17 of a file, we now have

   The revision label V60 = 1.17
   The branch label V60X = 1.17.60
   And an actual revision 1.17.60.1 essentially identical to 1.17

Why do we force a revision onto the branch? Because a checkout of the
V60X branch label will not succeed unless there's at least one revision
there (specifically, it does not fall back to check out the branch point
on the trunk).

I believe that, although CVS uses RCS format files to store data, it has
some smarts to avoid creating the branch for a file until it is actually
needed. Using RCS "raw" makes this a difficult strategy to manage.

The net result is that pretty much every file in out repository has
about 5 branches (one for each release), and that these branches /all/
contain at least one actual revision which is identical to the trunk
revision at which the branch is rooted. The vast majority of files
contain just this one revision on each branch.

The RCS strategy of storing backwards differences down the trunk, but
then forwards differences up branches makes this a reasonably efficient
strategy. Unfortunately, it seems to be a use-case which is not well
supported by subversion.

A I understand Karl's explanation, though, there seems to be nothing in
the subversion data structure which "knows" that deltas go backwards
from the tip. Is there anything I (or the cvs2scn authors, for that
matter) can do to cause branch deltas to be built forwards from the
branch point?

I also still don't understand the purpose of the "svnadmin deltify"
command. When would I want/need to use this?

I think our fallback strategy is to remove the branches from the RCS
files before we import them into subversion, and settle for keeping the
original RCS data around in case we need to do any detailed research
about anything outside the trunk. I'd rather not do this if it can be
avoided, though.

-- 
Cheers,
John
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Fri Apr 16 20:41:58 2004

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.