[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: File modification detection?

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2004-04-22 22:12:28 CEST

Ben Collins-Sussman <sussman@collab.net> writes:

> On Thu, 2004-04-22 at 14:28, Mike Mason wrote:
>
>> If Subversion is storing MD5s for the text bases anyhow, shouldn't the
>> comparison use the MD5 instead of being byte-for-byte?
>
> Hm, but that would force us to always read the *entire* text-base file.
> At the moment, we bail out as soon as we see a difference between the
> working and text-base files.
>
> My first thought was: the current algorithm is faster than
> checksumming, because we can bail early. Most changes aren't at the
> very end of a file.
>
> But then Karl pointed out that while "on average" our current algorithm
> bails after reading half the text-base, this is cancelled out by the
> fact that it's reading two files instead of one. So maybe the
> byte-for-byte and checksum strategies come out even? :-)

It's important to remember svn:keywords and svn:eol-style when
discussing working files and text bases. When we do a byte-for-byte
comparison with those keywords set then first the entire working file
is read, it gets "detranslated" and a new, temporary, file in
repository format is written. It's that temporary file that takes
part in the byte-for-byte comparison, and it's quite possible that the
write is the main performance hit.

A possible optimisation would be to store a second md5sum for the
working file, and then do an md5sum comparison by reading the working
file and thus avoid the detranslate/write altogether. A possible
problem is that the user may have edited an expanded keyword causing
the md5sum comparison to indicate a modification, whereas the
detranslate would drop that edit and indicate no modification.

-- 
Philip Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Apr 22 22:12:43 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.