This is a combined reply to two emails which raised similar points:
> My question is: is there reason to believe that, in practice, this
> property of active Subversion branches is a significant factor in
> increased repository size?
It's not just _active_ branches, though. No space is saved when the branch
ceases to be active, and becomes only historical record.
> This isn't a question of how the Subversion filesystem works, it's a
> question about statistical patterns in real-lifed repositories. For
> example, if it turns out that DB overhead is using 10x the space that
> these branch tip fulltexts are using, then optimizing active tips of
> branches would be pointless.
John Peacock wrote:
> Except that conversion from cvs to svn (whatever the program used) is
> the effect is *most* noticable. Historically I believe that has been a
> byproduct of the conversion process itself; if cvs2svn is better about
> copies instead of adding new files for branches (which is the goal, I
> that's wonderful.
Indeed, cvs2svn is now much better about branching.
The effect is very likely to be most noticable when converting from cvs to
svn, because that causes the instant creation of a repository full of
history. The size of a subversion repository begun from scratch will
gradually diverge from the size of it's hypothetical cvs counterpart, at a
rate dependent on the usage of branching.
> But the number of fulltexts generated per branch is going to vary wildly
> various projects; in some cases the branch is going to be mostly node
> in others it might have lots of modified files. I don't think that you
> demonstrated that the use of fulltexts on copies is significant to
> the performance gains by not having to regenerate the files on checkout
> the deltas.
> I fully and freely admit that fulltexts on modified copies will yield
> [slightly] larger repositories. I just don't think you have demonstrated
> that it
> matters in the grand scheme of things. And personally *I* am willing to
> [some] diskspace for performance (which is where the current scheme comes
> from, does it not?)...
OK. So far, I've been viewing this as "we use more space than CVS, isn't
that wrong?". I will stop considering the question from that viewpoint.
If we are to deliberately make a different space/time tradeoff than CVS,
then at very least there should be some documentation that we can point
users who get bitten by this at, so they understand.
I'm willing to perform some experiments to try to quantify the size of the
problem, but I will need some help deciding what to test.
Here is a scenario particularly worrying to me:
* Suppose you refactor a piece of software on a branch. All you need is one
change to some function, which then needs to have all its callsites changed,
and you have many tiny changes scattered over many files. Each file gets
stored as a fulltext, rather than a tiny diff.
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com
Received on Sat May 29 14:46:49 2004