[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

The small commit problem

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2003-07-29 18:46:27 CEST

Hello

I have been looking at a cvs2svn conversion and wondering why the
Subversion repository is so much larger than the CVS one. One of the
things that occurred to me is that the creation of a directory node in
the Subversion filesystem might make small commits relatively
expensive, particularly if the directory has a large number of
elements.

I have been experimenting with scripts like the one at the end of this
mail. It creates a simple repository containing a number of files and
then makes lots of "small" changes to measure how the repository
grows. I have tried both renaming a file (one rename per commit) and
editing a file (append a few bytes to one file per commit).

Files in Effect on repository size
directory
 10 : about 10k per edit or move
 50 : over 15k per edit, about 15k per move
100 : over 15k per edit or move
200 : about 20k per edit or move
500 : about 30k per edit, over 45k per move

I don't think 50, or even 200, is a large number of files to have in a
directory. Due to the way changes "bubble-up" through the Subversion
filesystem, the effect is amplified if the directory in question is
itself a child of a directory with lots of elements.

Thus a Subversion repository doesn't handle "small" commits
particularly well, there is a sort of threshold on the minimum size
for each commit. This could explain why we are getting reports that
CVS repositories convert to much larger Subversion repositories.

Does that sound plausible? If it does I wonder what we could do to
change it: make the nodes less expensive, or use some sort of "diffy"
directory storage, or...

Script follows

#!/bin/bash

STRESS=~/sw/subversion/svn/tools/dev/stress.pl
CHECK=db4.1_checkpoint

$STRESS -n0 -c -F200 -N1 -D0
REPO=file://`pwd`/repostress

rm -rf wc
#svn co $REPO/trunk wc &> /dev/null

$CHECK -1 -h repostress/db
rm -f `svnadmin archive repostress`
psize=`du -ks repostress | awk '{print $1}'`

for i in `seq 100` ; do
  for j in `seq 5` ; do
    #echo $i"x"$j >> wc/foo1 && svn ci -m "" wc &> /dev/null
    #echo $i"x"$j >> wc/foo1 && svn ci -m "" wc &> /dev/null
    svn mv -m "" $REPO/trunk/foo1 $REPO/trunk/xfoo1 &> /dev/null
    svn mv -m "" $REPO/trunk/xfoo1 $REPO/trunk/foo1 &> /dev/null
  done

  $CHECK -1 -h repostress/db
  rm -f `svnadmin archive repostress`
  nsize=`du -ks repostress | awk '{print $1}'`
  echo $psize $nsize $(($nsize-$psize))
  psize=$nsize

done

-- 
Philip Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Jul 29 18:48:40 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.