[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Subversion branch deltification policy is more space-hungry than CVS

From: Max Bowsher <maxb_at_ukf.net>
Date: 2004-05-29 14:46:20 CEST

This is a combined reply to two emails which raised similar points:

kfogel@collab.net wrote:
> My question is: is there reason to believe that, in practice, this
> property of active Subversion branches is a significant factor in
> increased repository size?

It's not just _active_ branches, though. No space is saved when the branch
ceases to be active, and becomes only historical record.

> This isn't a question of how the Subversion filesystem works, it's a
> question about statistical patterns in real-lifed repositories. For
> example, if it turns out that DB overhead is using 10x the space that
> these branch tip fulltexts are using, then optimizing active tips of
> branches would be pointless.

John Peacock wrote:
> Except that conversion from cvs to svn (whatever the program used) is
where
> the effect is *most* noticable. Historically I believe that has been a
> byproduct of the conversion process itself; if cvs2svn is better about
making
> copies instead of adding new files for branches (which is the goal, I
know),
> that's wonderful.

Indeed, cvs2svn is now much better about branching.
The effect is very likely to be most noticable when converting from cvs to
svn, because that causes the instant creation of a repository full of
history. The size of a subversion repository begun from scratch will
gradually diverge from the size of it's hypothetical cvs counterpart, at a
rate dependent on the usage of branching.

> But the number of fulltexts generated per branch is going to vary wildly
among
> various projects; in some cases the branch is going to be mostly node
copies,
> in others it might have lots of modified files. I don't think that you
have
> demonstrated that the use of fulltexts on copies is significant to
outweigh
> the performance gains by not having to regenerate the files on checkout
from
> the deltas.
>
> I fully and freely admit that fulltexts on modified copies will yield
> [slightly] larger repositories. I just don't think you have demonstrated
yet
> that it
> matters in the grand scheme of things. And personally *I* am willing to
trade
> [some] diskspace for performance (which is where the current scheme comes
> from, does it not?)...

OK. So far, I've been viewing this as "we use more space than CVS, isn't
that wrong?". I will stop considering the question from that viewpoint.

If we are to deliberately make a different space/time tradeoff than CVS,
then at very least there should be some documentation that we can point
users who get bitten by this at, so they understand.

I'm willing to perform some experiments to try to quantify the size of the
problem, but I will need some help deciding what to test.

Here is a scenario particularly worrying to me:
* Suppose you refactor a piece of software on a branch. All you need is one
change to some function, which then needs to have all its callsites changed,
and you have many tiny changes scattered over many files. Each file gets
stored as a fulltext, rather than a tiny diff.

Max.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat May 29 14:46:49 2004

This is an archived mail posted to the Subversion Dev mailing list.