[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Checkout really slow in Windows with lots of files in one directory

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Wed, 26 Jan 2011 17:15:40 +0100

On Wed, Jan 26, 2011 at 3:41 PM, Campbell Allan
<campbell.allan_at_sword-ciboodle.com> wrote:
>
> On Wednesday 26 Jan 2011, Neil Bird wrote:
>>    We have a graphics-oriented code-base that's auto-generated and has
>> >5000 source files in one directory.  While I can check this out OK on
>> Linux, we're seeing an unusable slow-down on Windows XP (NTFS), both using
>> Tortoise directly, and as a test on Linux with the Windows drive mapped
>> over CIFS.
>>
>>    The checkout starts sensibly enough, but then gets steadily slower and
>> slower and slower, to the point were we're not sure it'd actually ever end.
>>
>>    I know that there's a negative speed difference on NTFS, and that 1.7's
>> WC-NG might make this better, but this is getting near-logarithmically
>> slower.
>>
>>    Is that to be expected, or at least known about?
>>
>>
>>    (we're going to jigger the files around into sep. directories to get the
>> individual counts down;  I expect that to help in this instance).
>
> That is what I recall from previous reports. I originally was going to see if
> anything could be done as it sounds like a classic problem of a linear
> search/sort over a growing list. The big unanswered question was where was
> this list.
>
> If the code is auto generated would it be possible to generate it for each
> build? That's what we typically do where I work. Anything that is generated
> is not committed. A bad example would be to say I have java source code, I
> don't need to commit the compiled byte code too or jars too.

I seem to remember that this has something to do with the way the svn
client determines unique names for its temp files during such a
checkout. Something like: it first tries filename 'tmpfile1', if that
exists it tries 'tmpfile2', then 'tmpfile3', ... and so on. So if it's
checking out file number 5000, it first tries 4999 filenames that are
already in use, and only then comes to the conclusion that
'tmpfile5000' is the unique filename it should use. That could explain
the 'ever slowing down' behavior that you see, when more and more
files in the same dir get checked out.

I'm not entirely sure, but I vaguely remember that this came up as a
thread on the users list or on the dev list (but unfortunately, I
can't find it right now). I also seem to remember that this was fixed
on trunk, so should be much better in 1.7 (by choosing the unique
filenames randomly (and then checking if it already exists), instead
of with an incrementing number). Again, I can't find the commit or
dev-list discussion, but it's floating in the back of my head
somewhere...

If I have more time, I'll try to search the archives some more.

In the meantime (while we're waiting for 1.7 :-)): splitting it up
into multiple directories seems like a good workaround...

Cheers,

-- 
Johan
Received on 2011-01-26 17:16:41 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.