[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Using md5sum for svn status

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2005-04-27 01:36:26 CEST

kfogel@collab.net writes:

> Marcus Rueckert <darix@web.de> writes:
>> According to Ben (sussman) the current change detection does the
>> following steps:
>>
>> 1. load entries file into memory
>> 2. stats the file
>> 3. if the timestamps matches -> returns NOT_CHANGED
>> 4. if the timestamps differ it stats the text base.
>> 5. if the size of text base and file differ -> returns CHANGED
>> 6. if the sizes match it does a byte-by-byte comparison.
>>
>> I think step 6 can be optimized a bit.
>> The entries file has the md5sum of the text-base stored.
>> Why dont we just read the working file and md5sum the content.
>> This way we only need to read 1 file into memory (the working file) and
>> the md5sum algorithm might be faster than the diff algorithm.
>>
>> any comments?
>
> To calculate an MD5 sum, you must read every byte in the file.
>
> To discover that there is some difference between two files, you must
> read, on average, halfway through both files -- once you encounter a
> mismatch, you can stop. (This why Unix 'diff' and 'cmp' are not the
> same thing.)
>
> So I don't see that there's a big win here...

That ignores keyword expansion and eol conversion. When svn:keywords
or svn:eol-style is in use we "detranslate" the working file before
doing a byte-for-byte comparison between the detranslated file and the
text-base. If we stored the md5sum of the translated file we could
avoid the detranslation; that could well be a win.

-- 
Philip Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Apr 27 01:39:22 2005

This is an archived mail posted to the Subversion Dev mailing list.