[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Performance of svn+ssh vs. file for multiple files

From: Nico Kadel-Garcia <nkadel_at_gmail.com>
Date: Wed, 7 Jul 2010 07:08:28 -0400

On Tue, Jul 6, 2010 at 2:17 PM, Eric Peers <eric_at_missinglinktools.com> wrote:
> Howdy,
>
> I've got a program that needs to checkout specific files at specific
> versions. In this particular case a branch does not make sense. I have found
> that the performance of svn+ssh in this case is very bad.
>
> I run the rough equivalent of:
> svn update -r 2 file1 file2 file3 file4 file5
> svn update -r 3 file6 file7 file8 file9 file10

Ouch. Why not build a branch directory with a bunch of "svn:extern"
settings for this?

> overall I have about 100 such files, and 2 svn update calls. I've
> accomplished this with an xargs frontend to svn so as to not overrun the
> cmdline.
>
> if I use file:/// as a protocol, it runs in 3 seconds.
> if I use svn+ssh:/// as a protocol, it takes 53 seconds.
> if I run an svn update -r 3 with no files, it takes about 2s.

Doing individual SSH connections does have a very significant startup
cost on each session. It's pretty much built into the protocol

Can you do the checkout via svn or file, then "svn switch" to the
svn+ssh repository to push any changes?

You might check that your upstream SSH server has valid reverse DNS
for the IP addresses of your connecting clients: that's an old problem
involving DNS timesouts. If you can't get that, consider modifying
your upstream SSH server to not use reverse DNS lookups. This is not
configurable from OpenSSH config files unless things have changed
lately, and requires running the SSH daemon with 'sshd -u0'.

> I wrote a direct svn api-program to accept the file lists, make the
> authentication a single time, and then call svn_update3. This still runs
> super slow. around 53s still.
>
> I suspect the problem is because each individual file is called out, locked,
> etc. Is there a way to batch these locks together or improve performance?
> Cause the ssh channel/ra session to be reused?
>
> Perusing the source code suggests that svn_client__update_internal will be
> called for each element in my paths. Since an individual file lock/svn
> directory write does not seem to be overly performance costly, I suspect the
> problem is in the svn_client__open_ra_session_internal + svn_ra_do_update2
> calls from svn_client__update_internal? Is the subversion code opening a new
> ra_session for each of these files at the expense of an ssh+svnserve on the
> remote end? Is there a way to force a single RA session across all the files
> at an API level without writing my own svn_client__update_internal?
>
> thoughts here?
>
> thanks!
>   --eric

You're doing something complicated: slow performance is.... not unsurprising.
Received on 2010-07-07 13:10:06 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.