On Thu, Apr 15, 2010 at 02:09:21PM -0400, Andersen, Krista wrote:
> I have noticed some odd sync behavior since we upgraded to 1.6.9 about four weeks ago.
>
> (Mostly we are pleased with the improved sync performance over 1.6.3 - yay! However...)
>
> First: I have seen commits involving a large number (over 700 paths listed in the log) of files fail with the output:
> Transmitting file data ................svnsync: REPORT of 'http://serverName/parentDirectory/repoName': Could not read response body: connection was closed by server (http://serverName)
>
> When we saw sync issue due to revision size in 1.6.3, the output usually said something about chunk size delimiter. So this message is a little new. I attempted the same fix that we relied on before - I created an incremental dumpfile of this troublesome revision, sent it to the mirror server, and loaded it to the mirror repository. This seemed fine - no error output.
>
> Second: When I tried to sync the repos again I receive another failure message:
> Failed to get lock on destination repos, currently held by 'serverName:2d2b076c-df40-c50c-eb0a-e4a3a768c044'
> Failed to get lock on destination repos, currently held by 'serverName:2d2b076c-df40-c50c-eb0a-e4a3a768c044'
> Failed to get lock on destination repos, currently held by 'serverName:56e74c67-1445-4659-927b-9ff36a270e3f'
> Failed to get lock on destination repos, currently held by 'serverName:56e74c67-1445-4659-927b-9ff36a270e3f'
> Failed to get lock on destination repos, currently held by 'serverName:fdb640ac-97b0-c76b-913c-fa50baf48ee9'
> Failed to get lock on destination repos, currently held by 'serverName:fdb640ac-97b0-c76b-913c-fa50baf48ee9'
> svnsync: Couldn't get lock on destination repos after 10 attempts
>
> Usually I can use propdel to remove the sync-lock. However it does not solve the problem now. A peek into the repo/db/revprops/0/0 file shows there is no sync-lock. I also noticed that the last merged rev was still one rev behind the loaded rev number - so I edited this hoping it might help - (was no help).
>
> The other thing that is different in this situation (when compared to sync issues from svn 1.6.3), is that the lock number shown in the output is changing. Normally stale locks showed a consistent number after the serverName in the error message. This output shows a changing number.
>
> svnadmin verify of both the master and mirror location show no problems.
> svnadmin lstxns on the mirror repo showed about ten transaction beginning with the number of the last rev before the large commit. Removing these from the mirror repo did not help the stuck sync lock.
>
> So does anyone know - where is this new 1.6.9 lock? Why is it stuck? And how do I get my sync going again?
It sounds like you ran into a known race condition in svnsync,
which leads to svnsync meta data curruption (not repository data
corruption!).
Also, it looks like several svnsync processes were trying to sync
the repository at the same time. Can you provide more information
about how you have set up the syncing process? How many machines
are involved, and where is svnsync run?
I'll try to help you get the sync going again, though it sounds
like you've done most of what I will suggest already:
First, make sure that no svnsync is running on the repository.
Disable svnsync.
You need to figure out the latest revision which was synced to the
slave:
svnlook youngest /path/to/slave/repository
And then make sure that the svnsync:last-merged-rev revision property
at revision 0 matches the revision printed by 'svnlook youngest':
svn pg --revprop -r 0 svnsync:last-merged-rev
If it differs, edit it:
svn pe --revprop -r 0 svnsync:last-merged-rev
Next, make sure that no svnsync:currently-copying and no
svnsync:lock property is set:
svn pd --revprop -r svnsync:currently-copying
svn pd --revprop -r svnsync:lock
Now start syncing again, and it should work.
If it does not and you cannot figure out why, please ask for
more help.
To avoid corruption of the sync process in the future, you should
make sure that only *one* svnsync process is running at any given time.
On UNIX, this can be done using tools suck as lockfile(1), lockf(1),
or the like (lockfile is in the procmail package).
Run a sync script from cron that looks something like this:
#!/bin/sh
# get the lock
LOCKFILE=/tmp/`basename ${0}`.lock
lockfile -r 3 ${LOCKFILE} || exit 1
# run the sync
svnsync ...
rm -f ${LOCKFILE}
This workaround means that you have to run svnsync on either the master
or the slave, not both.
I don't know when this bug will be fixed yet.
As it stands people have to resort to an external locking mechanims for
svnsync because svnsync's built-in locking is inherintly racy.
Related issues in our bug tracker are:
http://subversion.tigris.org/issues/show_bug.cgi?id=3545
http://subversion.tigris.org/issues/show_bug.cgi?id=3546
If you want to be informed about progress on this bug, you can
add yourself to the Cc list of these issues.
Stefan
Received on 2010-04-15 21:02:47 CEST