Re: Using md5sum for svn status

From: <kfogel_at_collab.net>
Date: 2005-04-27 00:51:57 CEST

Marcus Rueckert <darix@web.de> writes:
> According to Ben (sussman) the current change detection does the
> following steps:
>
> 1. load entries file into memory
> 2. stats the file
> 3. if the timestamps matches -> returns NOT_CHANGED
> 4. if the timestamps differ it stats the text base.
> 5. if the size of text base and file differ -> returns CHANGED
> 6. if the sizes match it does a byte-by-byte comparison.
>
> I think step 6 can be optimized a bit.
> The entries file has the md5sum of the text-base stored.
> Why dont we just read the working file and md5sum the content.
> This way we only need to read 1 file into memory (the working file) and
> the md5sum algorithm might be faster than the diff algorithm.
>
> any comments?

To calculate an MD5 sum, you must read every byte in the file.

To discover that there is some difference between two files, you must
read, on average, halfway through both files -- once you encounter a
mismatch, you can stop. (This why Unix 'diff' and 'cmp' are not the
same thing.)

So I don't see that there's a big win here...

Best,
-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Apr 27 01:23:31 2005

This message: [ Message body ]
Next message: Philip Martin: "Re: Using md5sum for svn status"
Previous message: Marcus Rueckert: "Using md5sum for svn status"
In reply to: Marcus Rueckert: "Using md5sum for svn status"
Next in thread: Philip Martin: "Re: Using md5sum for svn status"
Reply: Philip Martin: "Re: Using md5sum for svn status"
Reply: Daniel Berlin: "Re: Using md5sum for svn status"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]