[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: fs-rep-sharing branch

From: Greg Stein <gstein_at_gmail.com>
Date: Wed, 22 Oct 2008 04:28:20 -0700

On Wed, Oct 22, 2008 at 3:13 AM, Erik Huelsmann <ehuels_at_gmail.com> wrote:
> On Wed, Oct 22, 2008 at 11:12 AM, Listman <listman_at_burble.net> wrote:
>> On Oct 22, 2008, at 12:42 AM- Oct 22, 2008, Ivan Zhakov wrote:
>>> On Wed, Oct 22, 2008 at 10:29 AM, David Glasser
>>> <glasser_at_davidglasser.net> wrote:
>>>> On Tue, Oct 21, 2008 at 9:39 PM, Listman <listman_at_burble.net> wrote:
>>>>> On Oct 21, 2008, at 8:05 PM- Oct 21, 2008, Greg Stein wrote:
>>>>>> After you've changed the editor API, the wc_entry_t structure,
>>>>>> migrated all old clients over to svn_checksum_t, and then switched the
>>>>>> storage defaults over to sh1, *then* we can talk about "an easy
>>>>>> switch".
>>>>>> The simple fact is that we're going to be running around with md5
>>>>>> checksums in hand for a long while. OR we double-compute, and I'm not
>>>>>> willing to burn that much CPU to satisfy somebody's misguided
>>>>>> preconception about md5 collisions. And double-compute generally means
>>>>>> that we *carry around* both checkums. You wanna update all the APIs
>>>>>> for that, too?
>>>>> +1 (on gregs position for this issue)
>>>>> lets not introduce more performance overheads based on a corner case.
>>>>> svn as it stands is way toooo slooow folks... please don't get
>>>>> distracted
>>>>> from
>>>>> that fact. i'm dealing with 20 minutes commits, 15 minute status checks
>>>>> etc
>>>>> and my
>>>>> users want to know why...
>>>>> also, svn already does way too many checksums from what i've been able
>>>>> to
>>>>> decipher.
>>>> Well, if all you care about is speed, then revert the fsfs rep-sharing
>>>> code entirely... it makes FSFS strictly less correct and presumably
>>>> strictly slower, bringing only a space benefit which (for FSFS)
>>>> appears to not be that large.
>>> I agree with David: Subversion reliability is much more important than
>>> speed. Also I do not understand why we so care about disk space for
>>> repository: disk space is very cheap and become cheaper every day.
>>> Think that priority list should be:
>>> - Reliability
>>> - Speed (CPU/memory usage)
>>> - Disk space
>> with todays SVN performance anyone with a need for centralized DM and large
>> data-sets would be better off using P4.
>> If SVN isn't reliable we lose, but if SVN is so slow that users aren't
>> efficient and
>> get frustrated we still lose.
> We acknowledge speed problems on the client side. Do we have speed
> problems on the server, though, is the question: some of the speed
> problems on the client are being addressed.

Actually, it is more pronounced on the server since it is shared
across many clients. Profiling tests of an svn server show that it
spends its time in two operations: computing deltas, and computing
checksums. An svn server is generally CPU-bound rather than IO-bound.
Planning for capacity requires monitoring of the CPU and adding more
servers well before worrying about overloading a storage system.

So yeah. We *do* have speed problems on the server.


To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-10-22 13:28:34 CEST

This is an archived mail posted to the Subversion Dev mailing list.