[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: The small commit problem

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2003-07-29 22:12:36 CEST

cmpilato@collab.net writes:

> Philip Martin <philip@codematters.co.uk> writes:
>
>> psize=`du -ks repostress | awk '{print $1}'`
>
> It would appear that your script is measuring just the sheer size of
> the database, log files and all (unless stress.pl does log removal).
> This means you're measuring not just the growth of the repository, but
> the growth of all the intermediate loggish steps taken to change that
> repository. So yes, you can expect that to grow almost linearly based
> on the size of the of the change. I mean, as each new edit or move
> comes in, we are replacing the directory entries list. That's a
> write-ahead log action of (probably) the entire new entries list *for
> each edit*.

The script itself (not stress.pl) does BDB checkpointing and log file
removal.

Looks like I overestimated the values (I didn't properly account for
those steps when the repository shrank) but the problem is real.
Here's what I get moving a file in a directory of 100 files

1400 2332 932
2332 2172 -160
2172 3020 848
3020 3888 868
3888 3704 -184
3704 4564 860
4564 5404 840
5404 5252 -152
5252 6100 848
6100 6964 864
6964 6796 -168
6796 7656 860
7656 8504 848
8504 9360 856
9360 9212 -148
9212 10088 876
10088 10980 892
10980 10800 -180
10800 11684 884
11684 12552 868
12552 12408 -144
12408 13292 884

Each line is 100 commits, the first two columns are repository size,
the third is the difference. Over 30 log files were used. That's 2200
moves and the repository has grown from 1400k to 13292k, about 5k per
commit. Repeating, but using a directory of 50 files instead of 100,
I get an average size per commit of about 3k.

This means that small changes are relatively expensive in a Subversion
repository. Each large directory that contributes to a commit is
going to add several KB to the repository size, a one line change to a
header file and a source file in separate directories could well add
10K to the repository.

When converting a large CVS repository those few KB are an additional
overhead that Subversion is likely to impose on a large number of the
commits. Joey's debhelper repository ended up with about 450 tags.

-- 
Philip Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Jul 29 22:13:32 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.