[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Improve svnsync performance over ra_serf.

From: Lieven Govaerts <svnlgo_at_mobsol.be>
Date: 2007-11-04 22:15:41 CET

Time for an update...

I've finished a slightly modified algorithm as described in my original
mail. All changes have been committed on the svnsync_ra_serf branch:
http://svn.collab.net/repos/svn/branches/svnsync_ra_serf/

The functionality is complete, only a few things are left to do:
- svnsync test 3 is failing
- I have to check the memory usage, as I made some changes to the pool
usage which may be improved.
- Syncing *to* a repository over ra_serf just doesn't work. Strangely
enough this doesn't seem to work with trunk either, so I'll check and
fix that on trunk.

There's one important change compared to my previous patch:
svn_ra_replay_range now provides both the revision content *and*
revision properties.

The whole replay_range algorithm is now implemented like this:

1. Send PROPFIND request to the master repo to get all revprops from
rev. N.
2. Send replay REPORT request to the master repo for rev. N.
.. loop over these two steps continuously, max. 5 open requests at a time.

When receiving a response on the PROPFIND request:
3. Read the revprops

When receiving a response on the REPORT request:
4. Write the 'currently copying' svn:sync property to the slave repo
5. Open the sync editor, add all non 'svn:' revprops to the editor.
6. Parse replay report and drive the editor
7. Close the editor, commit the transaction.
8. Request the list of existing rev properties to the slave repo
9. Write all revision properties other than 'svn:' to the slave repo
10. Remove all extra properties from the slave repo (those not in the
master repo)
11. Write the 'last-merged-revision' svn:sync property to the slave repo
12 Remove the 'currently copying' svn:sync property from the slave repo

In brief, what has changed compared to trunk?
- Both step 1 and step 2 were synchronous request/reply calls, we now
send multiple requests without waiting for response.
- The revision properties other than 'svn:' are included already in the
editor, saving one synchronous request per rev. property (impact
probably negligible)
- The rest (step 4-12) are either consequences of these changes, or
stayed as they were before.

In a very small test setup, syncing the first 100 revisions of the
subversion repository over ra_serf to a slave over ra_local, resulted in
a performance gain of 55% compared to trunk ra_serf, and appr. 60%
compared to trunk ra_neon. Performance gain over slow networks will
probably be bigger, as it's the network traffic part that's being optimized.

Lieven

Lieven Govaerts wrote:
> Attached patch is work-in-progress to - hopefully drastically - improve
> the performance of 'svnsync sync' over ra_serf.
>
> Problem with 'svnsync sync' right now is that it's a serial process,
> with lots of waiting on both the master and slave server. Basically it
> comes down to this:
> -> set 'currently-copying' revprop on rev 0 of the slave repository
> .. wait on slave server response ..
> -> send replay report request for rev. N to master
> .. wait on master server response ..
> -> parse replay report and drive editor
> -> commit rev. N on the slave repository
> .. wait on slave server response ..
> -> read all revprops from the master
> .. wait on master server response ..
> -> add all revprops to the slave repository
> .. wait on slave server response ..
> -> reset 'currently-copying' revprop
> .. wait on slave server response ..
> -> ... start all over for rev. N+1.
>
> While there's little option to reduce these waiting times for ra_neon,
> ra_serf has http pipelining support, which comes in handy. What I'd like
> to do is this:
> -> send request to get all revprops from rev. N from the master
> -> send replay report request for rev. N to master
> -> send request to get all revprops from rev. N+1 from the master
> -> send replay report request for rev. N+1 to master
> -> read the revprops for rev. N
> -> while opening the editor, add revprops to it.
> -> parse replay report for rev. N and drive editor
> -> commit rev. N & revprops on the slave repository
> .. wait on slave server response ..
> -> back to step 1.
>
> Attached patch will do a part of this. It already eliminates most of the
> time spent waiting on the master server for the replay report, by
> sending multiple replay requests over the pipeline while still parsing
> them one by one. The other parts of above flow aren't changed.
>
> The patch adds a new ra api function, svn_ra_replay_range, which takes a
> range of revisions and two callback functions. The first callback
> function (report start) is used to create the editor, the second
> callback function (report finished) will close the editor and copy all
> rev props. With this patch I had to create a second connection to copy
> the revprops, I hope to eliminate this 2nd connection again in a later
> stage.
>
> Some small tests show that the patch improves svnsync performance with
> about 15% (with a relatively fast network connection to the master
> server). While it passes the svnsync regression tests over the 4 ra
> layers, I've encountered some issues while testing it with different
> repositories, so it's not ready to commit.
>
> Review of the above approach and the patch is welcome, specifically I'd
> like to know if I'm not breaking any rules concerning atomic behavior,
> race conditions etc.
>
> Lieven

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Nov 4 22:15:52 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.