Re: Using md5sum for svn status

From: Peter McNab <mcnab_p_at_melbpc.org.au>
Date: 2005-04-27 06:57:46 CEST

Peter McNab wrote:

> kfogel@collab.net wrote:
>
>> Marcus Rueckert <darix@web.de> writes:
>>
>>
>>> According to Ben (sussman) the current change detection does the
>>> following steps:
>>>
>>> 1. load entries file into memory
>>> 2. stats the file
>>> 3. if the timestamps matches -> returns NOT_CHANGED
>>> 4. if the timestamps differ it stats the text base.
>>> 5. if the size of text base and file differ -> returns CHANGED
>>> 6. if the sizes match it does a byte-by-byte comparison.
>>>
>>> I think step 6 can be optimized a bit.
>>> The entries file has the md5sum of the text-base stored.
>>> Why dont we just read the working file and md5sum the content.
>>> This way we only need to read 1 file into memory (the working file) and
>>> the md5sum algorithm might be faster than the diff algorithm.
>>>
>>> any comments?
>>>
>>
>>
>> To calculate an MD5 sum, you must read every byte in the file.
>>
>> To discover that there is some difference between two files, you must
>> read, on average, halfway through both files -- once you encounter a
>> mismatch, you can stop. (This why Unix 'diff' and 'cmp' are not the
>> same thing.)
>>
>> So I don't see that there's a big win here...
>>
>> Best,
>> -Karl
>>
>>
>>
> I'm a little surprised this "summary" info (1..5) isn't made available
> by enquiry of the server and if step 6 is required then and only then
> bring down the file.
> I was looking at how TortoiseSVN stores a base working copy of files
> with the WC, which is great for instant diff, but is not so
> meaningfull for binary files.
> Somehow I thing an option to get and hold the summary info locally and
> only download the full file when absolutely necessary might be a good
> thing.
> We are hearing from folks on the list who have 500Mb binaries so
> unnecessary download and local duplication of these might be a
> productive goal for Subversion.
>
> Peter
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Wed Apr 27 07:00:34 2005

This message: [ Message body ]
Next message: Ben Collins-Sussman: "Re: Using md5sum for svn status"
Previous message: Travis Cline: "user auth with svn+ssh access on shared hosting inadequate"
Next in thread: Ben Collins-Sussman: "Re: Using md5sum for svn status"
Reply: Ben Collins-Sussman: "Re: Using md5sum for svn status"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]