[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: fs-rep-sharing branch

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: Wed, 22 Oct 2008 09:24:25 -0400

Greg Stein wrote:
> On Wed, Oct 22, 2008 at 5:49 AM, C. Michael Pilato <cmpilato_at_collab.net> wrote:
>> Greg Stein wrote:
>>> We *still* have all the problems that md5 is fully-intertwined in our
>>> code. I'm still not willing to do double-checksums and kill millions
>>> of coders for a few researchers who could simply tar their candidate
>>> pairs together, or gzip them. Yes, that's the brutal truth :-P ... the
>>> researchers need to use workarounds, and the millions get a fast
>>> product.
>> Please forgive my ignorance in this matter, but when you look at a typical
>> profile -- from a user's perspective -- of Subversion's bottlenecks, will a
>> second hash calculation even register? Surely network turnarounds and
>> working copy I/O dwarf this additional calculation in terms of cost to the
>> user's time. Don't they? Keep in mind that this calculation is only ever
>> made at commit time, too. It isn't as if we trade in SHA1 currency all over
>> the place.
>
> The client does not use SHA1 "all over the place." It only uses MD5
> values; any SHA1 is accidental because the client simply has no use
> for it. Until you rebuild a large body of code, the client *can't* use
> SHA1 values. Therefore, any keying using SHA1 implies running an
> independent checksum. That CPU adds up, and (frankly) saying that our
> I/O is the bottleneck is (IMO) a false pretense. I intend to fix that,
> so we do a lot less I/O. Additional CPU work will become more
> noticeable. How much? Unclear, but CPU isn't always available simply
> to burn away.

You misread me -- I wasn't claiming that we use SHA1 all over the place. In
fact, I was saying that we don't. ("It is**n't** as if...")

My point was that we still needn't carry SHA1's all over the place if the
only time we use SHA1 in FSFS is when calculating keys for the shared
representation collection. That work happens *only* during commits, and
will not affect the performance of any operations besides commit (which is
probably the least common of all version control operations).

I don't have a strong opinion about this matter -- just making sure that we
all maintain a sense of perspective about the real effects of the proposed
change.

-- 
C. Michael Pilato <cmpilato_at_collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Received on 2008-10-22 15:24:41 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.