[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: 'svnadmin load' doesn't deltify enough.

From: Max Bowsher <maxb_at_ukf.net>
Date: 2004-04-16 18:03:35 CEST

kfogel@collab.net wrote:
> "Erik Huelsmann" <e.huelsmann@gmx.net> writes:
>> Max Bowsher <maxb@ukf.net> writes
>>> The problem is that each branch is creating another set of fulltexts in
>>> the repository.
>>>
>>> I don't know how deltification is supposed to work with branches,
>>> hopefully someone can explain that?
>>
>> Just guessing here but Subversion does not have any idea about the
concepts
>> of trunk and branches. So what you see is that Subversion optimizes
accesses
>> by keeping HEAD undeltified be the file in a branch or on trunk.
>>
>> Thinking about why the Subversion community never saw this problem
before:
>> We remove dead branches. Therefore the tip of the branch no longer is in
>> HEAD and thus can be stored as delta.
>
> I think this might not be quite right.
>
> Let's start from the beginning:
>
> When a branch is created (by cvs2svn or otherwise), no new file
> fulltexts should be created because the branch is merely a copy or a
> set of copies. At least one new directory node would be created, of
> course; and more if the branch had to be "patch up" a lot to contain
> the exact set of files. But file fulltexts? No, shouldn't be any new
> ones.
>
> And if you commit to a branch, the predecessor node should get
> deltified against the new head of the branch: libsvn_fs has no
> preference to deltify against HEADs in /trunk versus HEADs elsewhere.
>
> Take the following repository:
>
> r1: /trunk/
> /foo/
> /foo/qux.txt
> /bar/
> /bar/blah.txt
> /branches/
>
> Now r2 makes a branch of /trunk. The arrows show where storage is shared:
>
> r2: /trunk/
> /foo/ <-------------------------.
> /foo/qux.txt <---------. |
> /bar/ <----------------|-----. |
> /bar/blah.txt <---------|--. | |
> /branches/ | | | |
> /branches/mybranch/ | | | |
> /foo/ ----------|--|--|--'
> /foo/qux.txt ---' | |
> /bar/ -------------|--'
> /bar/blah.txt -----'
>
> This means that in r2, both 'blah.txt' files are *the same node*, as
> are both 'qux.txt', and both 'foo' and 'bar' directories. (Sadly, I
> think 'mybranch' is not the same node as trunk, because we had to make
> a new node with a new CopyID. However, that caveat only applies to
> the top node in a copy.)
>
> r2:/branches is not the same node as r1:/branches, of course.
>
> I'm hoping Mike Pilato or someone will sanity check all my claims
> here, by the way :-).
>
> Okay, as of r2, both .txt nodes are fulltexts (notice the language:
> there are four .txt files, but only two nodes for those four files).
>
> Now we commit a change to blah.txt on the branch, creating r3:
>
> r3: /trunk/
> /foo/ <----------------------.
> /foo/qux.txt <---------. |
> /bar/ | |
> /bar/blah.txt <---------|-----|--- This is now deltified
against
> /branches/ | |
/branches/mybranch/bar/blah.txt
> /branches/mybranch/ | |
> /foo/ ----------|-----'
> /foo/qux.txt ---'
> /bar/
> /bar/blah.txt
>
> Why does /trunk/bar/blah.txt get deltified when someone makes a commit
> to /branches/mybranch/bar/blah.txt? Because inside the filesystem,
> the original files were the same node: fulltext vs deltatext is merely
> the "representation" of that node in the database. Since a commit to
> /branches/mybranch/bar/blah.txt will cause the filesystem to deltify
> the predecessor node (just like any commit, for Subversion doesn't
> think "/branches" is special), that means the HEAD of blah.txt in
> trunk will have a deltatext representation.
>
> If someone now commits to /trunk/bar/blah.txt to create r4, *then* the
> tip of trunk and the tip of branch will both have fulltexts, because
> starting in r3, the two blah.txt files were no longer sharing storage.
>
> (One interesting question is: would r3:/trunk/bar/blah.txt be
> redeltified against r4:/trunk/bar/blah.txt, or simply be left as a
> delta against the branch version of blah.txt? I don't know the answer
> offhand, but it's not really related to the original issue anyway, I'm
> just asking for fun. My guess is that we do not redeltify if the
> storage is already deltatext -- because there's no reason to believe
> we'd get better space savings.)
>
> So anyway, yes, if both the branch and the trunk are actively
> developed, the total number of fulltexts in the filesystem goes up,
> though perhaps not as quickly as one might expect, because of that
> shared initial storage.
>
> But, deleting a branch doesn't get rid of its fulltexts! The tip of a
> branch still exists even after the branch has been deleted, and since
> no commits are happening to those files, there's nothing to trigger
> further deltification. Therefore the fact that we tend to remove dead
> branches in Subversion's own repository shouldn't change the number of
> fulltexts.
>
> It doesn't seem likely to me that the extra fulltexts on branch tips
> could account for the kinds of storage size differences we're seeing
> here, anyway. I mean, yeah, if you create a lot of branches, and make
> commits to many different files on each branch (as opposed to many
> commits on a few files), then yes, it could affect total storage by
> these amounts.
>
> I haven't done the math for this particular repository, though. Maybe
> my instincts are off base here?

This repository is a former RCS repository, so for every file has a commit
on every branch. Instant size explosion. Exceptional case? Maybe not -
suppose you modify a significant number of the files on a branch (surely
quite a common scenario, at least for long running branches, being merged
to). The subversion repository will soon become larger than the equivalent
CVS repository. Subversion may need to reconsider its deltification scheme.

Max.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Apr 16 18:05:08 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.