> Just a little brush up about transactions first though:
> Transactions need to satisfy the ACID test:
> * Atomicity - All or nothing, partial results are not allowed.
> * Consistency - preserve data integrity
> * Isolation - execute as if they were run alone
> * Durability - results aren't lost by a failure
> Now, having a distributed system obviously makes ensuring atomicity much
> harder. The solution to this is the two-phase commit protocol. The main idea
> is to have all "resource managers" save a Durable copy of the transactions
> work before any of them commit.
> If one resource manager fails after another commits, the failed system can
> still commit after it recovers.
> The actual protocol to commit transaction T:
> Phase 1:
> T's coordinator asks all participating resource managers (RMs) to "prepare
> the transaction". The RMs replies "prepared" after the T's updates are
> Phase 2:
> After receiving "prepared" from all participating RMs, the coordinator tells
> all RMs to commit.
> If it receives "No" votes from ANY RMs, the coordinator tells all RMs to
> Participants acknowledge the work has been done by replying "Done" back to
> the coordinator.
> There are a lot of annoying details that fall out of this simple concept
> The participant MUST NOT do anything until the coordinator tells it to
> commit or abort the transaction.
> This is its "uncertainty period". That means that it must retain any locks
> that are owned by the transaction.
> Even after a participant restarts (and recovers after a failure), any
> transactions in an uncertain state are still uncertain.
So the central point is: even if some of the participants' commits
fail, it's okay for new requests to get post-commit data from those
participants who succeeded. They'll never see conflicting pre-commit
records from a failed participant, since the failed participant's
portion of the commit has been saved in a durable fashion,
guaranteeing that it'll re-process the commit when it recovers, before
answering any new requests. A client may not be able to *reach* all
the post-commit data, but that's an inherent problem in any
distributed system --- transactions don't make network links immortal.
But this implies that a participant can't even answer read-only
requests for records that have been "prepared" but not "committed" ---
it must wait for word from the coordinator, since it doesn't know
whether the other participants are handing out pre- or post-commit
data. That's the point you're making in the last paragraph.
Is that right?
So there is still a window where writers can block readers, waiting
for network traffic. Which sucks, but I'll bet one could prove it's
inevitable without further data (say, database state sequence numbers
which would allow you to get consistent states from everybody).
Received on Sat Oct 21 14:36:16 2006