[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Severe performance issues with large directories

From: Paul Holden <paul.holden_at_gmail.com>
Date: Fri, 9 Apr 2010 14:54:40 +0100

Hi Bert,

Many thanks for the quick response.

> I think you can find a lot of issues similar to your issue in our issue
> tracker.

Searching fail on my part - sorry for re-treading old ground.

> For WC-NG we move all the entries data in a single wc.db file in a .svn
> directory below the root of your working copy. This database is accessed via
> SQLite, so it doesn't need the chunked rewriting or anything of that. (It
> even has in-memory caching and transaction handling, so we don't have to do
> that in Subversion itself any more)

Sounds great. We have quite a deep, dense directory structure and so a
full update (or any walk over the whole working copy) involves
accessing hundreds of subdirectories. Merging is particularly
paintful. I imagine this could help a great deal.

> > 2) subversion appears to generate a temporary file in .svn\prop-base\ for
> > every file that's being updated. It's generating filenames sequentially,
> > which means that when 5,800 files are being updated it ends up doing this:
> >
> > file_open tempfile.tmp? Already exists!
> > file_open tempfile.2.tmp? Already exists!
> > file_open tempfile.3.tmp? Already exists!
> > ...some time later
> > file_open tempfile.5800.tmp? Yes!
> Wow.
> Are you sure that this is in prop-base, not .svn/tmp?

Yes, definitely. Each of these files have a svn:mime-type property of
'application/octet-stream', so I guess it's that (the property isn't
changing between updates however)

> For 1.7 we made the tempfilename generator better in guessing new names, but
> for property handling we won't be using files in 1.7. (Looking at these
> numbers and those that follow later in your mail, we might have to look in
> porting some of this back to 1.6).

I'd love to see this in 1.6, as it's biting us quite hard right now -
to the extent that we're seriously discussing moving this stuff out of
version control (which is terrifying). I'm sure we'll switch over to
1.7 as soon as we can however.

> Properties will be moved in wc.db, to remove the file accesses completely.
> (We can update them with the node information in a single transaction;
> without additional file accesses)

Again, sounds great :)

> > Is there any inherent reason these files need to be generated
> sequentially?
> > From reading the comments in 'svn_io_open_uniquely_named' it sounds like
> > these files are named sequentially for the benefit of people looking at
> > conflicts in their working directory. As these files are being generated
> > within the 'magic' .svn folder, is there any reason to number them
> > sequentially? Just calling rand() until there were no collisions would
> > probably give a huge increase in performance.
> In 1.7 we have a new api that uses a smarter algorithm, but we can't add
> public apis to 1.6 now.

It's a shame that the api would need to change to support this. I
suppose checking to see if the tempfile was being generated under
'.svn/prop-base' and using an alternative strategy is too gross? (I'm
half joking)

> > In case it's relevant, I'm using the CollabNet build of subversion on
> > Windows 7 64bit. Here's 'svn --version':
> >
> > C:\dev\CW_br2>svn --version
> This issue is actually worse on Windows then on linux, because NTFS is a
> fully transactional filesystem with a more advanced locking handling. And
> for this it needs to do more to open a file. (Some tests I performed 1.5
> year ago indicated that NTFS is more than 100 times slower on handling
> extremely small files, then the EXT3 filesystem on Linux. While througput
> within a single file is not far apart).

Yeah - we're seeing the same issue on some of our Linux boxes. The
problem is still there, but it's not as severe.

Many thanks,
Received on 2010-04-09 15:55:10 CEST

This is an archived mail posted to the Subversion Dev mailing list.