[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: 'svnadmin load' doesn't deltify enough.

From: <kfogel_at_collab.net>
Date: 2004-04-16 15:51:26 CEST

"Erik Huelsmann" <e.huelsmann@gmx.net> writes:
> Max Bowsher <maxb@ukf.net> writes
> > The problem is that each branch is creating another set of fulltexts in
> > the repository.
> >
> > I don't know how deltification is supposed to work with branches,
> > hopefully someone can explain that?
>
> Just guessing here but Subversion does not have any idea about the concepts
> of trunk and branches. So what you see is that Subversion optimizes accesses
> by keeping HEAD undeltified be the file in a branch or on trunk.
>
> Thinking about why the Subversion community never saw this problem before:
> We remove dead branches. Therefore the tip of the branch no longer is in
> HEAD and thus can be stored as delta.

I think this might not be quite right.

Let's start from the beginning:

When a branch is created (by cvs2svn or otherwise), no new file
fulltexts should be created because the branch is merely a copy or a
set of copies. At least one new directory node would be created, of
course; and more if the branch had to be "patch up" a lot to contain
the exact set of files. But file fulltexts? No, shouldn't be any new
ones.

And if you commit to a branch, the predecessor node should get
deltified against the new head of the branch: libsvn_fs has no
preference to deltify against HEADs in /trunk versus HEADs elsewhere.

Take the following repository:

    r1: /trunk/
           /foo/
           /foo/qux.txt
           /bar/
           /bar/blah.txt
        /branches/

Now r2 makes a branch of /trunk. The arrows show where storage is shared:

    r2: /trunk/
           /foo/ <-------------------------.
           /foo/qux.txt <---------. |
           /bar/ <----------------|-----. |
           /bar/blah.txt <---------|--. | |
        /branches/ | | | |
        /branches/mybranch/ | | | |
                   /foo/ ----------|--|--|--'
                   /foo/qux.txt ---' | |
                   /bar/ -------------|--'
                   /bar/blah.txt -----'

This means that in r2, both 'blah.txt' files are *the same node*, as
are both 'qux.txt', and both 'foo' and 'bar' directories. (Sadly, I
think 'mybranch' is not the same node as trunk, because we had to make
a new node with a new CopyID. However, that caveat only applies to
the top node in a copy.)

r2:/branches is not the same node as r1:/branches, of course.

I'm hoping Mike Pilato or someone will sanity check all my claims
here, by the way :-).

Okay, as of r2, both .txt nodes are fulltexts (notice the language:
there are four .txt files, but only two nodes for those four files).

Now we commit a change to blah.txt on the branch, creating r3:

    r3: /trunk/
           /foo/ <----------------------.
           /foo/qux.txt <---------. |
           /bar/ | |
           /bar/blah.txt <---------|-----|--- This is now deltified against
        /branches/ | | /branches/mybranch/bar/blah.txt
        /branches/mybranch/ | |
                   /foo/ ----------|-----'
                   /foo/qux.txt ---'
                   /bar/
                   /bar/blah.txt

Why does /trunk/bar/blah.txt get deltified when someone makes a commit
to /branches/mybranch/bar/blah.txt? Because inside the filesystem,
the original files were the same node: fulltext vs deltatext is merely
the "representation" of that node in the database. Since a commit to
/branches/mybranch/bar/blah.txt will cause the filesystem to deltify
the predecessor node (just like any commit, for Subversion doesn't
think "/branches" is special), that means the HEAD of blah.txt in
trunk will have a deltatext representation.

If someone now commits to /trunk/bar/blah.txt to create r4, *then* the
tip of trunk and the tip of branch will both have fulltexts, because
starting in r3, the two blah.txt files were no longer sharing storage.

(One interesting question is: would r3:/trunk/bar/blah.txt be
redeltified against r4:/trunk/bar/blah.txt, or simply be left as a
delta against the branch version of blah.txt? I don't know the answer
offhand, but it's not really related to the original issue anyway, I'm
just asking for fun. My guess is that we do not redeltify if the
storage is already deltatext -- because there's no reason to believe
we'd get better space savings.)

So anyway, yes, if both the branch and the trunk are actively
developed, the total number of fulltexts in the filesystem goes up,
though perhaps not as quickly as one might expect, because of that
shared initial storage.

But, deleting a branch doesn't get rid of its fulltexts! The tip of a
branch still exists even after the branch has been deleted, and since
no commits are happening to those files, there's nothing to trigger
further deltification. Therefore the fact that we tend to remove dead
branches in Subversion's own repository shouldn't change the number of
fulltexts.

It doesn't seem likely to me that the extra fulltexts on branch tips
could account for the kinds of storage size differences we're seeing
here, anyway. I mean, yeah, if you create a lot of branches, and make
commits to many different files on each branch (as opposed to many
commits on a few files), then yes, it could affect total storage by
these amounts.

I haven't done the math for this particular repository, though. Maybe
my instincts are off base here?

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Apr 16 17:07:15 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.