[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: application ill-suited for svn?

From: Matt Doran <matt.doran_at_papercut.biz>
Date: 2006-01-29 11:24:16 CET

Hi Dan,

Hmmm, looks like FSFS doesn't scale very well to handle 15K files in the
one directory ... but that's getting beyond my knowledge. We need
developer with FSFS knowledge to comment.

Maybe you should consider storing the files under a more hierarchical
structure, based on the file name (like Squid does with it's
cache). But if this is a problem with FSFS or SVN in general, then
it might be worth fixing.

Have you tried creating a BDB repo and dumping your existing repos, then
loading into the BDB one? Just to see if there is a significant
difference?

-- Matt

Dan White wrote:
> Matt,
> Each revision (I verified the commits are only single files) is
> generating
> a file in db/revs containing the delta for the file that was changed.
> It
> also contains data that looks like this:
>
> K 13
> filename1.xml
> V 20
> file zy.0.r7854/1078
> K 13
> filename2.xml
> V 19
> file 100.0.r7873/38
> K 13
> filename3.xml
> V 20
> file 2la.0.r18919/41
> K 13
> filename4.xml
> V 22
> file 1fk.0.r11244/1372
>
> With those 4 lines of data for every file and directory in the
> repository
> (15000+) you can see how we're getting 700k per commit, despite only
> committing a 2-3k file delta.
>
> I found this doc on the fsfs file formats:
> http://svn.collab.net/repos/svn/trunk/subversion/libsvn_fs_fs/structure
>
>
> ________________________________
>
> From: Matt Doran [mailto:matt.doran@papercut.biz]
> Sent: Saturday, January 28, 2006 7:22 PM
> To: Dan White
> Cc: users@subversion.tigris.org
> Subject: Re: application ill-suited for svn?
>
>
> Hi Dan,
>
> Even though Subversion has global revision numbers, it only stores the
> diffs of files that have changed in the commit (plus any meta-data that
> goes along with the commit ... like commit messages, etc). With FSFS
> things are slightly more complex than that, it uses a clever technique
> called skip-deltas to optimize accessing recent revisions, without
> having to have write permissions to previous revisions when committing.
> However, there is a slight size penalty with this approach. You can
> read about this here:
> http://svn.collab.net/repos/svn/trunk/notes/skip-deltas. My
> understanding is that because of this, repositories using the BDB
> backend will be a little smaller than FSFS, but BDB has some other
> trade-offs.
>
> I'm surprised that each commit is adding 700K. It doesn't sound right,
> but an svn dev might be able to add more to the discussion. In my
> experience SVN is more space efficient than CVS.
>
> Is there a possibility that you committing more than you think in each
> commit? i.e. if you run "svn log -v" on recent revisions ... are
> more changed files listed than you would expect?
>
> Cheers,
> Matt
>
> Dan White wrote:
>
> Sorry, I was wrong about the number of files. We actually have
> about
> 13k files. Each commit now is adding almost 700k to /dv/revs.
> If we
> were versioning only at the file level (which is all we really
> require),
> each commit should only use at most the size of the files being
> updated,
> plus any commit comments. This is one reason I'd consider going
> back to
> cvs for this particular repository. I believe we'd have to do a
> significant amount of work to recode our app to interface with
> cvs.
>
> As far as pruning older revisions goes, that's possible for
> roughly 25%
> of the files in the repository. I rebuilt the repository with
> the
> historical revisions for the other 75%, then imported only the
> most
> recent version of the 25% in a single mass commit. This
> decreased the
> repository size to about 11gb.
>
> -----Original Message-----
> From: Kevin Greiner [mailto:greinerk@gmail.com]
> Sent: Friday, January 27, 2006 5:28 PM
> To: Dan White
> Cc: users@subversion.tigris.org
> Subject: Re: application ill-suited for svn?
>
> On 1/25/06, Dan White <dwhite@clubmom.com>
> <mailto:dwhite@clubmom.com> wrote:
>
>
> Unfortunately we didn't consider how repository
> level versioning (which has no benefit in this
> application) would
> inflate the db size. Some 6000 files, each only a few
> kb in size, and
> 62000 revisions later, our /db/revs dir is about 19gb in
> size and
> growing almost 1gb daily. We never have multiple file
> commits.
>
>
>
> This is the first I've heard about reposity level versioning
> inflating
> the db size. Could you elaborate? If I did my math right
> (19,000,000kb
> / 62,000 revs) you're averaging about 300kb per commit. That
> does seem
> high to me. And if you're growing at 1gb/day that means you're
> getting
> roughly 3,300 commits/day. That sound about right?
>
> I'm wondering if you could remove older revisions periodically?
> The
> dump file outputs revisions in date order but I don't know if
> you
> could chop off, say, the first 20,000 revisions without borking
> the
> resulting loaded repo or not. Anyone tried this?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail:
> users-help@subversion.tigris.org
>
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Sun Jan 29 11:25:24 2006

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.