Marcus Rueckert <darix@web.de> writes:
> According to Ben (sussman) the current change detection does the
> following steps:
>
> 1. load entries file into memory
> 2. stats the file
> 3. if the timestamps matches -> returns NOT_CHANGED
> 4. if the timestamps differ it stats the text base.
> 5. if the size of text base and file differ -> returns CHANGED
> 6. if the sizes match it does a byte-by-byte comparison.
>
> I think step 6 can be optimized a bit.
> The entries file has the md5sum of the text-base stored.
> Why dont we just read the working file and md5sum the content.
> This way we only need to read 1 file into memory (the working file) and
> the md5sum algorithm might be faster than the diff algorithm.
>
> any comments?
To calculate an MD5 sum, you must read every byte in the file.
To discover that there is some difference between two files, you must
read, on average, halfway through both files -- once you encounter a
mismatch, you can stop. (This why Unix 'diff' and 'cmp' are not the
same thing.)
So I don't see that there's a big win here...
Best,
-Karl
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Apr 27 01:23:31 2005