[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: How can I setup two svnservers with svnsync and both should provide checkout and checkins

From: Nico Kadel-Garcia <nkadel_at_gmail.com>
Date: Tue, 26 Apr 2011 19:59:03 -0400

On Sun, Apr 24, 2011 at 7:20 AM, Michael Diers <mdiers_at_elego.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 2011-04-21 13:58, Nico Kadel-Garcia wrote:
>> On Thu, Apr 21, 2011 at 5:15 AM, Ian Wild <ian.wild_at_wandisco.com> wrote:
>>
>>> That sounds like a good translation to me. The maths gets complicated to put
>>> it mildly, but I know Dr Yeturu's work is in some part at least based on
>>> Paxos ( http://en.wikipedia.org/wiki/Paxos_algorithm ). AIUI we've got the
>>> only implementation of this model that can guarantee the consistency and
>>> ordering of transactions; important when you need your repositories to
>>> remain identical on every site!
>>> Ian
>>
>> *NOTHING* can guarantee this.. This is key to the difficulty of the
>> "merge" process for multiple branches against a common trunk.
>
> Nico,
>
> I believe you misunderstand how the repositories are kept in sync.

There's a fundamental problem that is not mathematical, it is
procedural. When the link between active-active servers for any
database is broken, *or phase delayed for any reason*, and each
database accepts alterations from clients without propagating the
change to a fundamentally shared repository, mathematics cannot decide
which changes must be merged, in which order.

Now, it's certainly possible to bring *down* one of the servers when
the connection is broken, and leaver the other active as the primary
database server to allow synchronization with the other database
later. This is how numerous active-active database systems work,
including Oracle and MySQL's systems. But "commits" on Subversion are
an atomic operation: you can't go back later and say "no, wait, that
one conflicted with a commit on another server, we can't take it".
That's what the merge step is for.

>> Maintaining two sets of changes on distinct repositories that wind up
>> altering the same set of text, or code, in divergent ways cannot be
>> guaranteed to be resolved "correctly" by a mechanical process, because
>> the process would have to "understand" the discrepancies and resolve
>> them. It can be as simple as a copyright notice, whose changes
>> represent social information outside the scope of the source control
>> system, or code changes in a subroutine in one branch and the handling
>> of the error codes generated by that subroutine in a distinct branch.
>>
>> It may do a much better *job* of this than other tools. That would be
>> cool, tricky merges have always been an issue. But mere ordering and
>> consistency is not sufficient, overlapping merges require attention.
>
> While your observation regarding merging is correct, it does not apply
> to the replication technique that WANdisco use in their product.
>
> The network nodes handling the repositories are distributed, but not
> isolated. The nodes actively agree on how to apply a proposed change to
> their repository replicas, effectively avoiding the merge problem.

Single mirrored backend database, synchronizatoin protected some sort
of locking mechanism to prevent simultaneous commits from the multiple
"active" front ends. This works well if the backend is a single shared
storage system, such as fiber channel, or a an online database which
handles turning commit requests into single threaded, atomic
operations. But that's not a mathematical data synchronization issue,
that's a single threaded locking problem to prevent out of order
transactions.

> WANdisco provide a well-written White Paper explaining this.
>
> http://www.wandisco.com/get/?f=documentation/whitepapers/WANdisco_DConE_White_Paper.pdf

Just read it. It confirms my description, implemented as a clever set
of tools to handle master/slave relationships at high speed on the
back end. The "propersers" operate under locking mechanisms handled by
the "acceptors", Sadly, it suffers from the same issue as ill
considered NTP configuratons. It takes a poll among the nodes to
decide whether a transaction number is high enough. I leave it to the
rest of the community to decide what happens when you set up a world
distributed network with 3 nodes each in 3 different locations, and
one of those locations gets its VPN connetions to the other locations
blocked. The 3 nodes can go off on their merry way, and the absolutely
necessary for Subversion central transaction numbering is *broken*
between the two sets of nodes.

When, and how, to turn the relevant repos into read-only nodes is left
as an exercise in resource management and paranoia. But the potential
for fractures and divergence among them is inherent in any network of
more than a few nodes, and switching from "active-active" to
"active-slave" when the link is broken is begging to set up
"slave-slave" for all sorts of confusing scenaries, and breaking the
ability to submit code. And cleaning *UP* the mess is horrible if
they're not set to "slave" behavior.

This sort of thing has been proving problematic for decades, if not
centuries. It certainly dates back to religious schisms in Europe,
where communications lags lead to the promulgation of prayer books and
excitement about being able to cheaply print Bibles to provide unified
scripture for priests who'd tend to stray from the teachings of Rome
without reliable, written guidelines. In this case, though, instead of
a cental "Pope" as a backend Subversion central repository must be,
you've allowed any set of bishops who are disconnected by a bad winter
or war to set up their own Pope.

Using a "Paxos" algorithm does not solve the problem of disconnected
nodes, unless you're reliant on hardcoded lists of active servers and
are absolutely reliant on a majority of a generous number of
pre-defined nodes to be available to provide that "vote". But if
you're detached from more than half the nodes, you can only be
read-only: nothing else is safe. Wrapping it in a set of equations
doesn't fix that. And unless you're very careful, som idiot *will*
rewrite local configuraitons to reduce that "half" requirement, and
synchronization of the central list of nodes becomes absolutely
critical.

It's workable, but potentially fragile, and it is an *old* distributed
computing problem.
Received on 2011-04-27 01:59:34 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.