[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: application ill-suited for svn?

From: Dan White <dwhite_at_clubmom.com>
Date: 2006-01-29 11:10:15 CET

Matt,
Each revision (I verified the commits are only single files) is
generating
a file in db/revs containing the delta for the file that was changed.
It
also contains data that looks like this:
 
    K 13
    filename1.xml
    V 20
    file zy.0.r7854/1078
    K 13
    filename2.xml
    V 19
    file 100.0.r7873/38
    K 13
    filename3.xml
    V 20
    file 2la.0.r18919/41
    K 13
    filename4.xml
    V 22
    file 1fk.0.r11244/1372

With those 4 lines of data for every file and directory in the
repository
(15000+) you can see how we're getting 700k per commit, despite only
committing a 2-3k file delta.

I found this doc on the fsfs file formats:
http://svn.collab.net/repos/svn/trunk/subversion/libsvn_fs_fs/structure

________________________________

From: Matt Doran [mailto:matt.doran@papercut.biz]
Sent: Saturday, January 28, 2006 7:22 PM
To: Dan White
Cc: users@subversion.tigris.org
Subject: Re: application ill-suited for svn?

Hi Dan,

Even though Subversion has global revision numbers, it only stores the
diffs of files that have changed in the commit (plus any meta-data that
goes along with the commit ... like commit messages, etc). With FSFS
things are slightly more complex than that, it uses a clever technique
called skip-deltas to optimize accessing recent revisions, without
having to have write permissions to previous revisions when committing.
However, there is a slight size penalty with this approach. You can
read about this here:
http://svn.collab.net/repos/svn/trunk/notes/skip-deltas. My
understanding is that because of this, repositories using the BDB
backend will be a little smaller than FSFS, but BDB has some other
trade-offs.

I'm surprised that each commit is adding 700K. It doesn't sound right,
but an svn dev might be able to add more to the discussion. In my
experience SVN is more space efficient than CVS.

Is there a possibility that you committing more than you think in each
commit? i.e. if you run "svn log -v" on recent revisions ... are
more changed files listed than you would expect?

Cheers,
Matt

Dan White wrote:

        Sorry, I was wrong about the number of files. We actually have
about
        13k files. Each commit now is adding almost 700k to /dv/revs.
If we
        were versioning only at the file level (which is all we really
require),
        each commit should only use at most the size of the files being
updated,
        plus any commit comments. This is one reason I'd consider going
back to
        cvs for this particular repository. I believe we'd have to do a
        significant amount of work to recode our app to interface with
cvs.
        
        As far as pruning older revisions goes, that's possible for
roughly 25%
        of the files in the repository. I rebuilt the repository with
the
        historical revisions for the other 75%, then imported only the
most
        recent version of the 25% in a single mass commit. This
decreased the
        repository size to about 11gb.
        
        -----Original Message-----
        From: Kevin Greiner [mailto:greinerk@gmail.com]
        Sent: Friday, January 27, 2006 5:28 PM
        To: Dan White
        Cc: users@subversion.tigris.org
        Subject: Re: application ill-suited for svn?
        
        On 1/25/06, Dan White <dwhite@clubmom.com>
<mailto:dwhite@clubmom.com> wrote:
          

                Unfortunately we didn't consider how repository
                level versioning (which has no benefit in this
application) would
                inflate the db size. Some 6000 files, each only a few
kb in size, and
                62000 revisions later, our /db/revs dir is about 19gb in
size and
                growing almost 1gb daily. We never have multiple file
commits.
                    

        
        This is the first I've heard about reposity level versioning
inflating
        the db size. Could you elaborate? If I did my math right
(19,000,000kb
        / 62,000 revs) you're averaging about 300kb per commit. That
does seem
        high to me. And if you're growing at 1gb/day that means you're
getting
        roughly 3,300 commits/day. That sound about right?
        
        I'm wondering if you could remove older revisions periodically?
The
        dump file outputs revisions in date order but I don't know if
you
        could chop off, say, the first 20,000 revisions without borking
the
        resulting loaded repo or not. Anyone tried this?
        
        
---------------------------------------------------------------------
        To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
        For additional commands, e-mail:
users-help@subversion.tigris.org
        
        
          

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Sun Jan 29 11:10:44 2006

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.