Hello
I have been looking at a cvs2svn conversion and wondering why the
Subversion repository is so much larger than the CVS one. One of the
things that occurred to me is that the creation of a directory node in
the Subversion filesystem might make small commits relatively
expensive, particularly if the directory has a large number of
elements.
I have been experimenting with scripts like the one at the end of this
mail. It creates a simple repository containing a number of files and
then makes lots of "small" changes to measure how the repository
grows. I have tried both renaming a file (one rename per commit) and
editing a file (append a few bytes to one file per commit).
Files in Effect on repository size
directory
10 : about 10k per edit or move
50 : over 15k per edit, about 15k per move
100 : over 15k per edit or move
200 : about 20k per edit or move
500 : about 30k per edit, over 45k per move
I don't think 50, or even 200, is a large number of files to have in a
directory. Due to the way changes "bubble-up" through the Subversion
filesystem, the effect is amplified if the directory in question is
itself a child of a directory with lots of elements.
Thus a Subversion repository doesn't handle "small" commits
particularly well, there is a sort of threshold on the minimum size
for each commit. This could explain why we are getting reports that
CVS repositories convert to much larger Subversion repositories.
Does that sound plausible? If it does I wonder what we could do to
change it: make the nodes less expensive, or use some sort of "diffy"
directory storage, or...
Script follows
#!/bin/bash
STRESS=~/sw/subversion/svn/tools/dev/stress.pl
CHECK=db4.1_checkpoint
$STRESS -n0 -c -F200 -N1 -D0
REPO=file://`pwd`/repostress
rm -rf wc
#svn co $REPO/trunk wc &> /dev/null
$CHECK -1 -h repostress/db
rm -f `svnadmin archive repostress`
psize=`du -ks repostress | awk '{print $1}'`
for i in `seq 100` ; do
for j in `seq 5` ; do
#echo $i"x"$j >> wc/foo1 && svn ci -m "" wc &> /dev/null
#echo $i"x"$j >> wc/foo1 && svn ci -m "" wc &> /dev/null
svn mv -m "" $REPO/trunk/foo1 $REPO/trunk/xfoo1 &> /dev/null
svn mv -m "" $REPO/trunk/xfoo1 $REPO/trunk/foo1 &> /dev/null
done
$CHECK -1 -h repostress/db
rm -f `svnadmin archive repostress`
nsize=`du -ks repostress | awk '{print $1}'`
echo $psize $nsize $(($nsize-$psize))
psize=$nsize
done
--
Philip Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Jul 29 18:48:40 2003