[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Any FSFS rep-sharing experts out there?

From: David Glasser <glasser_at_davidglasser.net>
Date: Wed, 7 Oct 2009 14:28:01 -0700

On Tue, Oct 6, 2009 at 7:10 PM, Paul Querna <chip_at_force-elite.com> wrote:
> On Mon, Oct 5, 2009 at 4:31 PM, David Glasser <glasser_at_davidglasser.net> wrote:
>> On Mon, Oct 5, 2009 at 9:52 AM, Branko Čibej <brane_at_xbc.nu> wrote:
>>> Daniel Shahaf wrote:
>>>> Branko Cibej wrote on Mon, 5 Oct 2009 at 18:08 +0200:
>>>> IIUC, the size of the DB is proportional to the number of (unique)
>>>> representations.  This doesn't tell anything about the amount of space
>>>> saved (by reusing representations).
>>>>
>>>
>>> Oh, yes, you're right. Silly me.
>>>
>>> But anyway the question is irrelevant. If we manage to lock up the
>>> server for tens of seconds because of a slightly larger-than-usual
>>> commit, we need to fix it. This is pretty much on my plate right now,
>>> but I'll ask around for help on understanding FSFS details.
>>
>> The relevance of the question is that if you're not actually getting a
>> benefit from rep caching (a feature whose cost/benefit ratios I
>> personally felt were not strong enough to warrant it being turned on
>> by default), you could just avoid all the contention by not using it.
>
> With help from Branko last night from IRC, pulled out the follow stats
> from the ASF repository:
> 15,612,528 representations total [1]
> 4,254,361 unique representations in the sqlitedb [2]
> (3.7x ratio)

I'm not sure how useful that number is. Is everything in the repo in
the db, or only reps created since rep-sharing was enabled? The more
relevant number is "what is the sum of all the reference count
numbers, compared to the 4.2 million number".

But more importantly, because the *only* advantage of rep-sharing is
that it potentially reduces disk use (there is absolutely no potential
time savings (unless you are very hopeful about disk cache) and there
is increased locking), the only relevant stats IMHO are "how much disk
space does the repo take up, compared to how much it would take up
without rep sharing... and how does that size delta affect the needs
of the ASF (cost of disks, backup speed, etc)".

--dave

> other misc stats:
> 2352 average size of a compressed rep [3]
> 16043 average size of expanded rep [4]
>
> [1] grep -a -r '^text:' $repos/db/revs | wc -l
> [2] select count(*) from rep_cache;
> [3] select AVG(size)  from rep_cache;
> [4] select AVG(expanded_size)  from rep_cache;
>

-- 
glasser_at_davidglasser.net | langtonlabs.org | flickr.com/photos/glasser/
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2404694
Received on 2009-10-07 23:28:39 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.