Hi folks.
I'm working on some code to automatically flow checkins from n
branches to m branches. Actually, I'm overhauling some old code for
doing this, mostly by rewriting.
The code I'm aiming to replace has had a longstanding problem with SVN
reliability. ISTR that one of the issues was that a change would be
checked in, and then checked back out (or merged), and the change just
made would seem to not be there - for a while, just long enough to
cause a merge issue in an automatic merge program. Then the next time
you check, the change is there, as though nothing had ever been wrong.
And I'm now seeing precisely the issue described above not in the new
program (not yet at least), but in a unit test for the new program.
The unit test is checking out a file with an integer in it, adding one
to the integer, checking it back in, checking out the prior rev to a
different WC, then merging to the latest rev. It's usually fine doing
this, but once in a while the integer seems to have not been
incremented.
The test seems to fail more often when you haven't checked it for a
while - running it in a loop seems to succeed more often - but I'm
still planning to leave these tests running over a couple of days and
see what percentage I get. So perhaps there's some sort of race
condition, the window for which is widened by data not being in some
cache or other.
I'm using Python with the subprocess module, spawning distinct svn
subprocesses for each operation, as is the old code. I'm not (yet?)
using pysvn or similar.
I don't have access to the SVN server, though I was an SVN
administrator in a past life.
What I'm seeing, and what oldtimers here are describing, sounds
consistent with an inappropriate choice of MPM or other form of
concurrency on the server side. Toward exploring this hypothesis (I
can't even run httpd -l on the server), what MPM's are appropriate for
SVN, and what MPM's are not? The legal MPM's appear to be
beos|event|worker|prefork|mpmt_os2 in the *ix sources for Apache
2.2.15 - which is almost certainly not the same version we're using in
production, but hopefully the list of MPM's is still close. And yes,
beos and mpmt_os2 are probably irrelevant ^_^
About all I know about the specifics of this SVN server is that it's
Apache https with Servlet 2.4; JBoss-4.0.4.GA (build:
CVSTag=JBoss_4_0_4_GA date=200605151000)/Tomcat-5.5 - IOW, what I can
get from a "telnet-like" banner. I also know that it has some sort of
cryptography layer involved that goes beyond what https brings - it
may be limited to public key authentication of checkins.
I'm checking out to an NFS mounted home directory, and that directory
is NFS mounted with any caching-specific options. I doubt that's
related though - usually a single client will be fine with NFS; it's
when you have multiple clients writing the same file that there's
trouble with NFS, and I doubt there are many people writing to my home
directory :). The NFS client is RHEL 5.1. The NFS server is some
release of HP-UX.
The SVN server is at least 5 router hops away (traceroute gets puzzled
after a while), probably more - it's far off in terms of network
topology.
The SVN client version is 1.6.4. I googled for a while about SVN and
race conditions, including checking an SVN changelog through 1.6.11,
and didn't find anything related to race conditions that might be
applicable; there was a race condition fixed back in 1.4.x, but I
imagine we'd have that.
Another hypothesis I'd like to discuss here: If I switch my code to
using pysvn instead of a bunch of svn subprocesses, would it be
possible to write pysvn code that will establish a single,
mostly-persistent session to the server, and hence avoid this
(concurrency?) issue? That is, I believe I can authenticate once and
keep the creds around, but will pysvn try to reconnect now and then
implicitly?
TIA!
Received on 2010-05-18 02:03:03 CEST