[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Any FSFS rep-sharing experts out there?

From: Paul Querna <chip_at_force-elite.com>
Date: Thu, 8 Oct 2009 00:48:09 -0700

On Wed, Oct 7, 2009 at 2:28 PM, David Glasser <glasser_at_davidglasser.net> wrote:
> On Tue, Oct 6, 2009 at 7:10 PM, Paul Querna <chip_at_force-elite.com> wrote:
>> On Mon, Oct 5, 2009 at 4:31 PM, David Glasser <glasser_at_davidglasser.net> wrote:
>>> On Mon, Oct 5, 2009 at 9:52 AM, Branko Čibej <brane_at_xbc.nu> wrote:
>>>> Daniel Shahaf wrote:
>>>>> Branko Cibej wrote on Mon, 5 Oct 2009 at 18:08 +0200:
>>>>> IIUC, the size of the DB is proportional to the number of (unique)
>>>>> representations.  This doesn't tell anything about the amount of space
>>>>> saved (by reusing representations).
>>>>>
>>>>
>>>> Oh, yes, you're right. Silly me.
>>>>
>>>> But anyway the question is irrelevant. If we manage to lock up the
>>>> server for tens of seconds because of a slightly larger-than-usual
>>>> commit, we need to fix it. This is pretty much on my plate right now,
>>>> but I'll ask around for help on understanding FSFS details.
>>>
>>> The relevance of the question is that if you're not actually getting a
>>> benefit from rep caching (a feature whose cost/benefit ratios I
>>> personally felt were not strong enough to warrant it being turned on
>>> by default), you could just avoid all the contention by not using it.
>>
>> With help from Branko last night from IRC, pulled out the follow stats
>> from the ASF repository:
>> 15,612,528 representations total [1]
>> 4,254,361 unique representations in the sqlitedb [2]
>> (3.7x ratio)
>
> I'm not sure how useful that number is.  Is everything in the repo in
> the db, or only reps created since rep-sharing was enabled?

Everything in the repo. We did a full dump and reload for svn 1.6,
and enabled rep-sharing before starting the load. (filtered out some
paths at the same time, wasn't a pointless exercise)

>  The more
> relevant number is "what is the sum of all the reference count
> numbers, compared to the 4.2 million number".

tell me what to run to get you the interesting statistics, and I'm
happy to do that :)

> But more importantly, because the *only* advantage of rep-sharing is
> that it potentially reduces disk use (there is absolutely no potential
> time savings (unless you are very hopeful about disk cache) and there
> is increased locking), the only relevant stats IMHO are "how much disk
> space does the repo take up, compared to how much it would take up
> without rep sharing... and how does that size delta affect the needs
> of the ASF (cost of disks, backup speed, etc)".

We saw a pretty massive speedup upgrading 1.5-> 1.6. I do attribute
that somewhat to less disk thrashing, but its hard to compare that to
pre-rep-sharing, since we did lots of things around that time to get
speedups every way we could. Reducing repo size though is a big deal,
our repo is easily 80gb++, cutting that by more than 20% is huge.

-Paul

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2404815
Received on 2009-10-08 09:48:28 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.