I use Subversion on Windows.
If I run "svn status" in a repo with 3200 files, SVN does 310000 file
operations such as open/read/close/stat.
It looks like it's comparing the contents of EACH file with the base/
file. That's overkill. It reads hundreds of megabytes from the disk for
no real reason.
A better solution is:
* check if timestamp & size matches, if they do, mark the file as not
* compute the SHA1 hash of the file and compare it to the entries file,
if equal, mark the file as not modified, otherwise, the file has been
I understand that this will not be as accurate at the current method,
but is this quite small loss of accuracy really relevant? Reading
several 100 megabytes of data from the disk just to check that something
is not modified, when we already know to a 99.9% certainty that it isn't
modified, is overkill.
If the loss of accuracy is not acceptable, at least provide an option to
enable step 1) above.
There is always a trade-off between accuracy and performance. For
instance, make just checks the timestamps of the files, that's accurate
enough for make, and make is used in plenty of business applications. In
what real world scenarios will the size/time of a file remain constant
even though it has been modified? Does SVN really have to sacrifice a
great deal of performance just to handle this rare case properly?
Besides, I noticed the following things:
* The block size for the read operation is 512 bytes. Using a more
sensible block size, like 16k, would reduce the number of system calls.
Increasing the block size will not impact performance since files are
always read sequentially from the start to the end.
* Instead of reading the full working copy file and the full base file,
compare the SHA1 hash of the working copy with the 'entries' file. That
would reduce the number of bytes that need to be read from the disk by
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org
Received on Tue Apr 26 14:54:00 2005