[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Subversion branch deltification policy is more space-hungry than CVS

From: John Pybus <john.pybus_at_zoology.oxford.ac.uk>
Date: 2004-06-02 19:50:22 CEST

kfogel@collab.net wrote:
> "Max Bowsher" <maxb@ukf.net> writes:
>
>>kfogel@collab.net wrote:

> Only if people *are* getting bitten by this :-). We should first make
> sure that this effect isn't lost in the noise of other things.

Reading the svn lists, people converting from CVS seem to think so.
This would be because they will come in with long-lived repositories
containing real examples of branch usage over time (of which there must
be relatively few with SVN so far?), and because they have a definite
size to compare to. People starting new projects, or not trying to
import old history, wouldn't be expected to notice for some time, and
having no hard number to compare wouldn't notice so strongly. (I'm
ignoring all the cvs2svn inefficiency discussions)

>>I'm willing to perform some experiments to try to quantify the size of the
>>problem, but I will need some help deciding what to test.
>
> I don't mean to sound snide by this at all, but: having a test for the
> proposed problem should be equivalent to fully understanding the
> problem in the first place. IOW, if we don't know exactly how to test
> for the problem, then what exactly *is* the problem? A precise
> description should be a blueprint for a test. If it's not, then maybe
> the problem hasn't really been precisely described.

Surely having some exploration of the space of possible branch
strategies, and the disk usage associated with it, will be more
informative than none. Though clearly the usual warnings about
synthetic benchmarking would apply.

I do have a slight worry about Subversion's deltification strategy,
though perhaps this was gone through back in the mists of time, and I
just don't understand.

Say a project branches at a stable release (1.0) and then continues to
make many changes to the head, while making fixes to the 1.0 branch
which only touch a few files. The the relative cost of operations would be:

i) Checkout of trunk: Cheap, full texts available[0], and common.

ii) Checkout of 1.0 tag: Relatively expensive, has to apply many
deltas[1], and not that likely, though maybe people would want fulltexts
as a target for diff.

iii) Checkout of 1.0 branch: Almost as expensive as ii) since most of
the nodes are shared with only a few are stored fulltext. More common
than ii) since developers patching and packaging need access to it,
maybe even deployers would be exporting.

  [0]: Assuming BDB not FSFS here of course
  [1]: Yes I know not as many as it might be; skip-lists etc

If in case iii) the cost of accessing old nodes shared with trunk is
acceptable, what is the benefit of storing fulltexts for the changed
nodes even on long lived branches? On short lived branches, the cost of
fulltexts born even when the branch is no longer required seem no more
sensible.

John

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jun 2 19:52:32 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.