Ben Collins-Sussman wrote:
>On Thu, 2004-04-22 at 14:28, Mike Mason wrote:
>
>  
>
>>If Subversion is storing MD5s for the text bases anyhow, shouldn't the 
>>comparison use the MD5 instead of being byte-for-byte?
>>    
>>
>
>Hm, but that would force us to always read the *entire* text-base file. 
>At the moment, we bail out as soon as we see a difference between the
>working and text-base files.
>
>My first thought was:  the current algorithm is faster than
>checksumming, because we can bail early.  Most changes aren't at the
>very end of a file.
>
>But then Karl pointed out that while "on average" our current algorithm
>bails after reading half the text-base, this is cancelled out by the
>fact that it's reading two files instead of one.  So maybe the
>byte-for-byte and checksum strategies come out even?  :-)
>
Well, I figure most of the time my working files are going to be cached 
by the operating system[1] because I've been working on them. CPUs are 
pretty fast so it's disk IO we're worried about here -- does that make a 
difference? As someone already pointed out I guess this isn't really 
that important unless you're storing big files that change content but 
not size (like, er, maybe a disk image).
Mike.
[1] For useful values of "operating system" -- Windows XP seems to like 
having 300 megs of free ram on my 1 gig machine and then endlessly 
grinding the disk for me.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Apr 22 22:00:06 2004