On Wed, Oct 7, 2009 at 2:28 PM, David Glasser <glasser_at_davidglasser.net> wrote:
> On Tue, Oct 6, 2009 at 7:10 PM, Paul Querna <chip_at_force-elite.com> wrote:
>> On Mon, Oct 5, 2009 at 4:31 PM, David Glasser <glasser_at_davidglasser.net> wrote:
>>> On Mon, Oct 5, 2009 at 9:52 AM, Branko Čibej <brane_at_xbc.nu> wrote:
>>>> Daniel Shahaf wrote:
>>>>> Branko Cibej wrote on Mon, 5 Oct 2009 at 18:08 +0200:
>>>>> IIUC, the size of the DB is proportional to the number of (unique)
>>>>> representations. This doesn't tell anything about the amount of space
>>>>> saved (by reusing representations).
>>>> Oh, yes, you're right. Silly me.
>>>> But anyway the question is irrelevant. If we manage to lock up the
>>>> server for tens of seconds because of a slightly larger-than-usual
>>>> commit, we need to fix it. This is pretty much on my plate right now,
>>>> but I'll ask around for help on understanding FSFS details.
>>> The relevance of the question is that if you're not actually getting a
>>> benefit from rep caching (a feature whose cost/benefit ratios I
>>> personally felt were not strong enough to warrant it being turned on
>>> by default), you could just avoid all the contention by not using it.
>> With help from Branko last night from IRC, pulled out the follow stats
>> from the ASF repository:
>> 15,612,528 representations total 
>> 4,254,361 unique representations in the sqlitedb 
>> (3.7x ratio)
> I'm not sure how useful that number is. Is everything in the repo in
> the db, or only reps created since rep-sharing was enabled?
Everything in the repo. We did a full dump and reload for svn 1.6,
and enabled rep-sharing before starting the load. (filtered out some
paths at the same time, wasn't a pointless exercise)
> The more
> relevant number is "what is the sum of all the reference count
> numbers, compared to the 4.2 million number".
tell me what to run to get you the interesting statistics, and I'm
happy to do that :)
> But more importantly, because the *only* advantage of rep-sharing is
> that it potentially reduces disk use (there is absolutely no potential
> time savings (unless you are very hopeful about disk cache) and there
> is increased locking), the only relevant stats IMHO are "how much disk
> space does the repo take up, compared to how much it would take up
> without rep sharing... and how does that size delta affect the needs
> of the ASF (cost of disks, backup speed, etc)".
We saw a pretty massive speedup upgrading 1.5-> 1.6. I do attribute
that somewhat to less disk thrashing, but its hard to compare that to
pre-rep-sharing, since we did lots of things around that time to get
speedups every way we could. Reducing repo size though is a big deal,
our repo is easily 80gb++, cutting that by more than 20% is huge.
Received on 2009-10-08 09:48:28 CEST