[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Any FSFS rep-sharing experts out there?

From: David Glasser <glasser_at_davidglasser.net>
Date: Thu, 8 Oct 2009 13:47:15 -0700

On Thu, Oct 8, 2009 at 11:07 AM, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> C. Michael Pilato wrote on Thu, 8 Oct 2009 at 12:51 -0400:
>> Branko Cibej wrote:
>> > The rep-cache database gets opened deep within svn_fs_fs__open and
>> > svn_fs_fs__create. We don't really have a way to distinguish between
>> > open-for-read and open-for-write in svn_fs_open. I can't form an opinion
>> > right now on whether that's a serious omission or not, but in any case
>> > adding an open-mode would be a huge conceptional change, of the svn-2.0
>> > kind, IMHO.
>> I can't see why we'd need to add different access modes.  Why not simply
>> make the code avoid opening the cache database until it is needed?
> I took a stab, see attached.
> The ideas were:
> * centralize reading the config (avoid code duplication)
> * open the DB as late as possible (but don't bother closing it once it's opened)
> * write to the DB only after finishing the FS commit (thus enabling commit_body()
>  to be run outside the sqlite txn)
> It passes tests (C, basic, and commit) (but I'm positive I could have bugs
> that the tests wouldn't catch).

Took a look at this and it generally looks good. I was concerned at
first that there could be a downside to a failure to update the DB
after a successful commit_body, but that was back from when the table
had a "ref count" for each rep; the current code has a different more
arbitrary uniquifier, so this should work.

This raises a question, though: do we even need to do
write_reps_to_cache inside the FSFS write lock? I actually don't
think so. Note that (as of r33408, back on the original feature
branch), Hyrum decided not to care if multiple transactions around the
same time tried to call set_rep_reference with the same checksum.
(Nobody passes TRUE for reject_dup.) If we take the SQLite call out
of the write lock, the only real downside is that revisions committed
very soon after this one can't share reps with it (but if we're a
server with such frequent revisions, we'd probably prefer the extra
concurrency anyway).

In fact, why do you even need to run write_reps_to_cache in a
transaction at all? I think that the DB has to obey this invariant:

  "If at any point, it is possible to read
HASH=>(rev,offset,size,expanded_size) from the DB, then there must
exist a rep for HASH at (rev,offset) with the given size and expanded
size for the indefinite future."

But it doesn't actually need to obey anything stronger than that: it's
not necessary that that entry remain in the database, for example.

So yeah, your patch looks good but I'd go two steps farther and run
the set_rep calls outside of the write lock and not in a transaction.

(This is making the assumption that a series of N non-transaction
SQLite INSERT statements (which is to say, a series of N implicit
transactions) is as efficient as one transaction with N INSERTs.
Maybe it isn't; I'm not sure. Getting it out of the FSFS write lock
would still be good, though.)


glasser_at_davidglasser.net | langtonlabs.org | flickr.com/photos/glasser/
Received on 2009-10-08 22:47:42 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.