[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Feature request: pipelining checkout and update

From: Marc-Antoine Ruel <maruel_at_gmail.com>
Date: Sun, 13 Jan 2008 19:23:05 -0500

Hi

Use case:
I'm synching a lot of small files, for many hundreds of megs worth.
I'm on Wifi + VPN + far far away = > 150 ms ping. I have rather slow
checkout even if the pipe is > 5 mbit/s.

Hypothesis:
The sync is limited by the latency I have from the server, not the
actual bandwidth capacity. If the client would sync many files
concurrently, in pipeline, it would inherently go much faster

Testing:
I tried a reduced case to see if the there is immediate room for
improvement. So I took a directory which only contained dir1 and dir2.
It was on XP on https protocol with svn 1.5.0, I don't know the exact
trunk version (sorry).

dir1 261 File(s) 41 Dir(s) 13334007 bytes
dir2 1728 File(s) 770 Dir(s) 30763364 bytes

As you can see, the directories aren't correctly balanced but that it
is still sufficient to show my point. I wanted to have different files
to be sure I wasn't affected by any kind of duplicate detection.

So my tests are:
Updating the directory that contains both subdirectories: 322 seconds
Updating both directories, one after the other: 323 seconds. (process
starting latency + initial https connection results in ~1 second
overhead) (101 for dir1 and 222 seconds for dir2)
Updating both sub directories at the same time: 89 seconds (dir1) and
216 seconds (perl).

I couldn't believe that is was faster when running two check out than
one at a time so I tried again the second and third test, in reverse
order.

Updating both subdirectories at the same time: 111 seconds (dir1) and
206 seconds (perl).
Updating both directories, one after the other: 341 seconds. (116 for
dir1 and 225 seconds for dir2)

Analysis:
So as you can see, the absolute error is very high (>30 seconds!), but
nevertheless, it's possible to see that running two checkout in
parallel is as fast as running one at a time, which means:
- Neither the server, the client or the bandwidth is the limiting factor.
- The limiting factor is something else: the latency to get each file.

Conclusion:
By pipelining the checkout, i.e. requesting many files at a time, svn
would reduce the effect of the latency.

Thanks

Marc-Antoine

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-01-14 01:23:17 CET

This is an archived mail posted to the Subversion Dev mailing list.