[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: When to use Berkeley transactions.

From: Branko ÄŒibej <brane_at_xbc.nu>
Date: 2003-02-21 07:58:03 CET

Karl Fogel wrote:

>Brane hasn't described his vision for replacing Berkeley transactions
>with locking yet, so it's possible that what I write here will be
>superseded by his proposal.
>
I hope so.

>However, I'm *pretty* confident that
>Subversion's current use of transactions is necessary and appropriate,
>and will try to explain why we shouldn't reduce their use.
>
>What is a Berkeley transaction?
>
>A transaction is effectively a private copy of the entire database.
>
No it's not. A transaction is _not_ invisible to other users of the
database, even when not committed. Nobody outside the transaction sees
the changes until they're committed, but everybody sees the locks that
the transaction creates -- and the transaction sees other people's locks.

[skip lots of txn-101 that Karl just invented]

>Of course, we could do it without transactions, if we implemented our
>own locking scheme. And that locking scheme would be equivalent in
>power and scope to... Berkeley transactions!
>
There's another locking scheme, and it's called Berkeley locks. You can
lock and unlock an object (a table row, if you will) without starting a
transaction (and all the logging that involves). It's just that locking
happens automatically when you touch an object from within a transaction.

>I suspect this is how transactions got invented. People realized that
>individual locking schemes could be abstracted and implemented inside
>the database itself, and then they'd never have to worry about it
>again.
>
Oh my. You've missed your vocation, you should be writing history books. :-)

>I'm not saying it wouldn't be more efficient to have our own locking
>scheme. It probably would.
>
It would be a _lot_ more efficient. Every time you create a transaction,
you end up writing to the log files. And every time you commit it, you
have to sync the logs. That's a _lot_ of fsyncs, and they're slowing us
down tremendously. What's worse, we're writing and sync'ing logs even
for read-only operations that don't change a single bit in the database.

> We know the data intimately; we can take
>shortcuts that Berkeley would never dare. But I don't think the gain
>would be very large (after all, Sleepcat works helluva hard to
>implement transactions as efficiently as they can),
>
However hard they work on making transactions efficient, they can't
avoid the fsyncs. In case you're wondering if those fsyncs are really
that bad, hear this: I hardcoded a DB_NOSYNC when opening the database
in fs.c -- and reduced "make check" time on Windows by *half*. Of
course, we can't do that for a production repository, but it makes you
think, doesn't it?

>and it would come at a high cost in complexity and new bugs.
>
Our code wouldn't become that much more complex.

>So, barring a surprisingly brilliant insight from Brane (I'm not
>saying it can't happen...), my feeling is that we should leave
>Subversion's use of BDB transactions alone, and concentrate on
>changing our checkpointing and other things.
>
>
There's no surprisingly brilliant insight here. Locking and unlocking a
DB object does not incur the logging and fsync overhead, and it's
sufficient for most read-only operations. That would speed up the reads,
but it would _also_ speed up writes on average because each
txn_checkpoint would have much less work to do.

The question now is, where can we replace txns with locks?

Quoting from your other mail:

> 'revisions':
> Well, only the revprops might be changing. I guess one wants a
> consistent picture of those. So we'd lock just the revision
> record we're reading from, for the duration of the read. During
> that time, someone changing a revprop on that revision would be
> blocked, but that wouldn't be very long, so it's okay.
>
> 'nodes', 'representations', 'strings':
> What do we lock here? Would the locking interfere with
> deltification?
>
I think we always read revision, node and representation records into
memory at once. That means we can safely move from a

    start txn; read; commit txn

pattern to a

    lock row; read; unlock row

pattern, without sacrificing either safety, correctness or code
complexity. We have to lock 'revisions' because of rev props, and
'representations' because of deltification, and we _might_ have to lock
'nodes' (have to think about that, it's concievable that we can skip
locking there completely).

In 'strings', we'd have to lock the record so that deltification doesn't
delete it while it's still half read. I can't remember exactly at the
moment (and haven't time to look at the code -- have to get to work),
but I seem to recall that we create a transaction for every partial read
of a string. If that's so, we can easily change to the same pattern as
for the other tables.

I can't recall if we have any operations that read from more than one
table in the same transaction. But even if we do, we can easily extend
the trail concept to locking, and all we have to be careful about is the
order of the locks.

The nice thing about this, implementation-wise, is that we can make a
/gradual/ move from reads-with-txns to reads-with-locks. Do the easy
ones first ('revisions', 'nodes', 'representations'), then move on to
the hard ones.

All of the above reasoning extends naturally to 'uuids', too.

> 'changes':
> No need to lock this for read-only operations, right?
>
>
I suspect you're right.

    Brane

P.S.: All of the above assumes that we don't have "svn obliterate". Once
we have that operations, we'll definitely have to lock 'nodes' and
'changes'.

-- 
Brane ÄŒibej   <brane_at_xbc.nu>   http://www.xbc.nu/brane/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Feb 21 07:58:47 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.