[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Any FSFS rep-sharing experts out there?

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Thu, 8 Oct 2009 10:36:43 +0200 (Jerusalem Standard Time)

Paul Querna wrote on Thu, 8 Oct 2009 at 00:48 -0700:
> On Wed, Oct 7, 2009 at 2:28 PM, David Glasser <glasser_at_davidglasser.net> wrote:
> > On Tue, Oct 6, 2009 at 7:10 PM, Paul Querna <chip_at_force-elite.com> wrote:
> >> With help from Branko last night from IRC, pulled out the follow stats
> >> from the ASF repository:
> >> 15,612,528 representations total [1]
> >> 4,254,361 unique representations in the sqlitedb [2]
> >> (3.7x ratio)
> >
> > I'm not sure how useful that number is.  Is everything in the repo in
> > the db, or only reps created since rep-sharing was enabled?
>
> Everything in the repo. We did a full dump and reload for svn 1.6,
> and enabled rep-sharing before starting the load. (filtered out some
> paths at the same time, wasn't a pointless exercise)
>
> >  The more
> > relevant number is "what is the sum of all the reference count
> > numbers, compared to the 4.2 million number".
>
> tell me what to run to get you the interesting statistics, and I'm
> happy to do that :)
>

Since you enabled rep-sharing prior to starting the load, then (IIUC) the
"sum of all the reference counts" should be the same 15M number as above.

(If you had rep-sharing disabled during portions of history, then the
number of reps in those portions should be subtracted from the 15M.)

To see how much disk space is saved, I suppose you'll have to dump|load
with enable-rep-sharing=false set --- I don't know of an easier way
(not without having the reference counts).

Daniel

> > But more importantly, because the *only* advantage of rep-sharing is
> > that it potentially reduces disk use (there is absolutely no potential
> > time savings (unless you are very hopeful about disk cache) and there
> > is increased locking), the only relevant stats IMHO are "how much disk
> > space does the repo take up, compared to how much it would take up
> > without rep sharing... and how does that size delta affect the needs
> > of the ASF (cost of disks, backup speed, etc)".
>
> We saw a pretty massive speedup upgrading 1.5-> 1.6. I do attribute
> that somewhat to less disk thrashing, but its hard to compare that to
> pre-rep-sharing, since we did lots of things around that time to get
> speedups every way we could. Reducing repo size though is a big deal,
> our repo is easily 80gb++, cutting that by more than 20% is huge.
>
> -Paul
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2404840
Received on 2009-10-08 10:37:04 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.