On Wed, Oct 22, 2008 at 5:49 AM, C. Michael Pilato <cmpilato_at_collab.net> wrote:
> Greg Stein wrote:
>> We *still* have all the problems that md5 is fully-intertwined in our
>> code. I'm still not willing to do double-checksums and kill millions
>> of coders for a few researchers who could simply tar their candidate
>> pairs together, or gzip them. Yes, that's the brutal truth :-P ... the
>> researchers need to use workarounds, and the millions get a fast
>> product.
>
> Please forgive my ignorance in this matter, but when you look at a typical
> profile -- from a user's perspective -- of Subversion's bottlenecks, will a
> second hash calculation even register? Surely network turnarounds and
> working copy I/O dwarf this additional calculation in terms of cost to the
> user's time. Don't they? Keep in mind that this calculation is only ever
> made at commit time, too. It isn't as if we trade in SHA1 currency all over
> the place.
The client does not use SHA1 "all over the place." It only uses MD5
values; any SHA1 is accidental because the client simply has no use
for it. Until you rebuild a large body of code, the client *can't* use
SHA1 values. Therefore, any keying using SHA1 implies running an
independent checksum. That CPU adds up, and (frankly) saying that our
I/O is the bottleneck is (IMO) a false pretense. I intend to fix that,
so we do a lot less I/O. Additional CPU work will become more
noticeable. How much? Unclear, but CPU isn't always available simply
to burn away.
Cheers,
-g
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-10-22 15:03:25 CEST