[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

When to use Berkeley transactions.

From: Karl Fogel <kfogel_at_newton.ch.collab.net>
Date: 2003-02-21 03:20:44 CET

This is the other half of the Discussion Formerly Known As

   "Transcript of chat between me and Sleeepycat", which was
   "Checkpoint less frequently", which used to be called
   "Still hang on svn 4951 RedHat 7.3 SMP"

At this point, I think I'll not bother to include former subject lines
with "(Was: ...)" :-)

Brane hasn't described his vision for replacing Berkeley transactions
with locking yet, so it's possible that what I write here will be
superseded by his proposal. However, I'm *pretty* confident that
Subversion's current use of transactions is necessary and appropriate,
and will try to explain why we shouldn't reduce their use.

What is a Berkeley transaction?

A transaction is effectively a private copy of the entire database.
Anything you do in a transaction is invisible to those outside the
transaction; and anything done outside a transaction is invisible to
those inside it. When a transaction is committed, it either entirely
succeeds, or entirely fails. (If this sounds familiar, it's only
because Berkeley and Subversion have some of the same needs; I'm
describing Berkeley transactions here, not Subversion txns :-) ).

Sometimes, in order to maintain these guarantees, the transaction must
lock records, causing other readers or writers to block until the
transaction completes. (The Berkeley docs have some good examples and
discussion at http://www.sleepycat.com/docs/ref/transapp/inc.html).

Subversion uses Berkeley transactions for almost everything, because
it needs to maintain semantic integrity across tables. For example,
imagine a checkout without transactions:

   1. During the checkout, the fs code reads a representation into

   2. Now some commit causes that node to be deltified. The
      representation's key does not change, but the value at that key
      does, because the rep now points to a new (svndiff) string in
      the `strings' table. The old, fulltext string is removed.

   3. Now the checkout thread uses its in-memory rep to fetch the
      node's data. Whups! No string there, sorry.

With transactions, there's no problem: step 2 just blocks until the
checkout's transaction closes.

Of course, we could do it without transactions, if we implemented our
own locking scheme. And that locking scheme would be equivalent in
power and scope to... Berkeley transactions!

I suspect this is how transactions got invented. People realized that
individual locking schemes could be abstracted and implemented inside
the database itself, and then they'd never have to worry about it

I'm not saying it wouldn't be more efficient to have our own locking
scheme. It probably would. We know the data intimately; we can take
shortcuts that Berkeley would never dare. But I don't think the gain
would be very large (after all, Sleepcat works helluva hard to
implement transactions as efficiently as they can), and it would come
at a high cost in complexity and new bugs.

So, barring a surprisingly brilliant insight from Brane (I'm not
saying it can't happen...), my feeling is that we should leave
Subversion's use of BDB transactions alone, and concentrate on
changing our checkpointing and other things.


To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Feb 21 03:53:58 2003

This is an archived mail posted to the Subversion Dev mailing list.