[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

["Kirby C. Bohling" <kbohling@birddog.com>] Re: Disk/Memory Consumption of SVN

From: <cmpilato_at_collab.net>
Date: 2002-02-01 20:42:48 CET

No need for this to NOT be public (hope you agree, Kirby).

------- Start of forwarded message -------
Message-ID: <3C5AEB5D.10007@birddog.com>
Date: Fri, 01 Feb 2002 13:24:13 -0600
From: "Kirby C. Bohling" <kbohling@birddog.com>
MIME-Version: 1.0
To: cmpilato@collab.net
Subject: Re: Disk/Memory Consumption of SVN
References: <3C5AE289.2050805@birddog.com> <x7elk5f2uh.fsf@pascal.ch.collab.net>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Good god, you guys are repsonsive... The beauty of interested
developers... Everytime we install the software at our shop, we do tag
the CVS repository do an export, and build from the export.

Went to a heavily updated file and did a CVS log and grepped the output
for the tag "INSTALL_RELEASE.*", which is every install we do. We do
frequent installs, average about 1 a business day.

For simplicity here is the script:

#!/bin/sh

CVS_MODULE=source
SVN_DIRNAME=second
echo "Starting conversion"

tac installTags | (
read oldTag

rm -rf /u1/repository/second initial second

echo "Creating subversion repository"
svnadmin create /u1/repository/second

echo "Checking out $oldTag of $CVS_MODULE for initial import"
cvs -Q export -d initial -r $oldTag $CVS_MODULE

echo "Importing into subversion"
svn import -q -m"Initial Import of revision $oldTag"
file:///u1/repository/second initial

echo "Checking a working copy out of subversion"
svn co -q file:///u1/repository/second

while read newTag
do
         date
         echo "Old Tag: $oldTag New: $newTag" ;
         rm -rf old new
         cvs -Q export -d old -r $oldTag $CVS_MODULE
         cvs -Q export -d new -r $newTag $CVS_MODULE

         echo "Diff'ing the versions."
         diff -u -N -r old new > patch-$oldTag-$newTag

         echo "Applying the patch"
         ( cd second ; patch -p1 < ../patch-$oldTag-$newTag )
         echo "$newTag" > $SVN_DIRNAME/cvsTag

         while ( cd second ; svn status | grep "^?" > /dev/null )
         do
                 ( cd second ; svn status | grep "^?" | cut -f6- -d" " |
xargs -r svn add )
         done

         ( cd second ; svn status | grep '^!' | cut -f6- -d" " | xargs
-r svn delete )

         if svn status "$SVN_DIRNAME" | grep -v "^[AMRD]"
         then
                 exit 1
         fi

         echo "Checking into subversion"
         svn ci -m"Historical modification from CVS Tag: $oldTag to
$newTag" "$SVN_DIRNAME"

         echo "Sleeping for 1 seconds."
         sleep 1
         oldTag=$newTag
done
)

It isn't particularly efficient, but it gets me some data. The only
cavet is if you pressed control C during a tag, this will remove the
files and recreate them in the next repository. I have watched those
looked for those and remove the troublesome tags from installTags.

I tried using db_archive there, and it told me this:

db_archive: region error detected; run recovery.
db_archive: Ignoring log file: log.0000000082: unsupported log version 5
db_archive: PANIC: Invalid argument
db_archive: open: DB_RUNRECOVERY: Fatal error, run database recovery

So then I did:

db_recover

Now all my __db files are gone... oops! Small path problem... I was
using the db3 utils that come with my RedHat path, as opposed to the
utils that I built for this. Opps! I will re-run and find out what is
going on. Just FYI, this took 4.2 hours of running check in. It is my
opinion that this is a VM problem because I am using all of the ram on
the box. I have all the patch files so I can estimate the rough size of
changes if you are interested in the performance of this. I will send
more information when I get the stuff rebuilt. There are roughly 800MB
of log files so it appears to be a log file issue, not a repository
issue. By the write times it appears to have just cycled thru the 10M
files and create a new one, and the blowing away the old ones would be safe.

        Thanks for the feed back,

                Kirby

cmpilato@collab.net wrote:
> Kirby, thanks for your interest in Subversion. Some thoughts:
>
> - Berkeley DB uses a write-ahead journaling filesystem, and the logs
> it uses (found in path/to/repos/db/*.log) can get to be really
> big, really fast. Have you tried using the db_archive utility
> (which comes with BDB) to determine which of those log files can
> be safely removed, and then removed those? (NOTE to the dev-list:
> we really need to automate this process, code-internally if
> possible)
>
> - When you say you "exported out successive version of CVS", does
> this means you simply cycled through all the tags that were made
> over the course of the 9 months? Just wondering.
>
> "Kirby C. Bohling" <kbohling@birddog.com> writes:
>
>
>>SubVersionists,
>>
>>I am currently interested in replacing CVS. I have evaluated several
>>different source repositories. SVN is very interesting, and looks like
>>a candidate for my replacement (I will wait for it to be finished of
>>course).
>>
>>So in the interest of tinkering with it, I took my current CVS
>>repository and checked out about 9 months worth of install tags (think
>>230 revisions). The source code is roughly 10-12MB. I did an export
>>from CVS, checked into SVN then exported out successive versions of CVS
>>did a diff, applied the patch and SVN checked in the result. Once I got
>>this up and running then I could play with it to my hearts content with
>>a real amount of data in it.
>>
>>I have two concerns to point out. I have searched the archives for
>>similar things, but I have yet to find them. One is that the same
>>repository in CVS takes about 130MB of disk space, and has 2 years worth
>>of revisions. The SVN was at 1.3GB when I completely ran the filesystem
>>out of space and had 8-9 months and was missing several of the huge
>>files from early in the CVS tree that had been removed. The second is
>>while check in's take a lot of memory, check outs take more. I have
>>roughly 300MB of VM on my linux machine, and I literally couldn't check
>>out trees that I had used for a year in CVS. The directory had 1000
>>5-10K files that I was using as data for a regression test. I can give
>>you as much or as little information as you want if this is a new issue.
>> I probably can't give you the real data (NDA's are like that *grin*).
>>
>>I think SVN has a serious shot at being a CVS killer, and addressing
>>these two issues will go a long way with me to getting me to switch when
>>SVN is ready for prime time. I was unsure if I should present them here
>>or put them in as bugs/features. My apologies if I choose poorly.
>>
>> Thanks,
>> Kirby
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
>>For additional commands, e-mail: dev-help@subversion.tigris.org
>>

------- End of forwarded message -------

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:03 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.