RE: working copy storage for RA properties

From: Bill Tutt <billtut_at_microsoft.com>
Date: 2000-12-07 22:48:21 CET

From: Jim Blandy [mailto:jimb@zwingli.cygnus.com]

> > Just a little brush up about transactions first though:
> > Transactions need to satisfy the ACID test:
> > * Atomicity - All or nothing, partial results are not allowed.
> > * Consistency - preserve data integrity
> > * Isolation - execute as if they were run alone
> > * Durability - results aren't lost by a failure
> >
> > Now, having a distributed system obviously makes ensuring atomicity much
> > harder. The solution to this is the two-phase commit protocol. The main
idea
> > is to have all "resource managers" save a Durable copy of the
transactions
> > work before any of them commit.
>
> > If one resource manager fails after another commits, the failed system
can
> > still commit after it recovers.
>
> > The actual protocol to commit transaction T:
> > Phase 1:
> > T's coordinator asks all participating resource managers (RMs) to
"prepare
> > the transaction". The RMs replies "prepared" after the T's updates are
> > Durable.
>
> > Phase 2:
> > After receiving "prepared" from all participating RMs, the coordinator
tells
> > all RMs to commit.
> > If it receives "No" votes from ANY RMs, the coordinator tells all RMs to
> > abort.
> > Participants acknowledge the work has been done by replying "Done" back
to
> > the coordinator.
>
> > There are a lot of annoying details that fall out of this simple concept
> > though.
>
> > The participant MUST NOT do anything until the coordinator tells it to
> > commit or abort the transaction.
> > This is its "uncertainty period". That means that it must retain any
locks
> > that are owned by the transaction.
> > Even after a participant restarts (and recovers after a failure), any
> > transactions in an uncertain state are still uncertain.

> So the central point is: even if some of the participants' commits
> fail

Actually, the central point is maintaining all properties of ACID:
Atomic, Consistent, Isolated, and Durable.

> it's okay for new requests to get post-commit data from those
> participants who succeeded. They'll never see conflicting pre-commit
> records from a failed participant, since the failed participant's
> portion of the commit has been saved in a durable fashion,
> guaranteeing that it'll re-process the commit when it recovers, before
> answering any new requests. A client may not be able to *reach* all
> the post-commit data, but that's an inherent problem in any
> distributed system --- transactions don't make network links immortal.

> But this implies that a participant can't even answer read-only
> requests for records that have been "prepared" but not "committed" ---
> it must wait for word from the coordinator, since it doesn't know
> whether the other participants are handing out pre- or post-commit
> data. That's the point you're making in the last paragraph.

> Is that right?

Yes it is. All write locks must be held until after you have acquired
information about whether to commit, or abort the transaction.

> So there is still a window where writers can block readers, waiting
> for network traffic. Which sucks, but I'll bet one could prove it's
> inevitable without further data (say, database state sequence numbers
> which would allow you to get consistent states from everybody).

You got it. Nobody said architecting a distributed architecture was easy,
and the distributed transaction "uncertainy" problem certainly doesn't help
matters. There are additional tweaks you can make to the above problem that
would resolve the problem of not being able to reconnect back to the
transaction manager, but that adds additional overhead to the 2 phase commit
protocol, and the persisted transaction state required to make the
transaction Durable.

Bill
Received on Sat Oct 21 14:36:16 2006

This message: [ Message body ]
Next message: Karl Fogel: "text and diff sharing in the repository"
Previous message: Greg Hudson: "Re: Filesystem structure question"
Maybe in reply to: Greg Stein: "working copy storage for RA properties"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]