On Mon, Aug 02, 2010 at 03:25:48PM -0400, Vallon, Justin wrote:
> > E.g. Subversion's FSFS needs to create a revision file from the commit's
> > transaction, and move the finalized revision file into place.
> > After the revision file has been moved into place successfully, FSFS also
> > updates the svn:date revision property and moves the revision properties file
> > into place (or copies revprop data into an sqlite database if you use
> > revprop packing). Then, it updates the 'current' file which contains the
> > number of the current HEAD revision. If you use representation sharing to
> > save disk space, the commit may involve further updates to yet another
> > sqlite database.
> >
> > All these actions need to complete in order to have a consistent state.
> >
> > If you're interested in seeing the code that does this, look at the
> > svn_fs_fs__commit() and commit_body() functions in
> > http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
>
> I see this is executed with a FS write lock. My concern would be
> focused on the interaction between the commit code and any rollback
> code. For example, if the commit dies (any any point during the
> commit), what will be required to insure that the repository behaves
> as if the commit never started? Will a repo cleanup be required; will
> the next committer cleanup the partial rev automatically (ie:
> overwrite stale files); will the repo be hopelessly inconsistent?
I honestly didn't know so I went and asked.
And learned something!
<stsp> users asking interesting questions: http://mail-archives.apache.org/mod_mbox/subversion-users/201008.mbox/%3C6EC02A00CC9F684DAF4AF4084CA84D5F01C40CD7@DRMBX3.winmail.deshaw.com%3E
<stsp> i dunno how fsfs behaves in face of an interrupted commit; whether or not it needs to be rolled back
<danielsh> if you haven't touched current than the rev file will never be read and will be overwritten
<danielsh> stsp: does that answer your question?
<stsp> i think so
<stsp> because the rev file of the following commit will have the same name to move things into place onto
<danielsh> write lock only for revprop change and commit
<danielsh> :-)
<stsp> so, using rsync for backup is fine?
<danielsh> if you copy current first, yes
<stsp> what's hotcopy for then? just bdb?
<ehu> stsp: copying 'current' first ... :-)
<stsp> ok, so what happens if I don't copy current first?
<danielsh> you can copy revs/
<danielsh> then a commit happens
<danielsh> then you copy current
<danielsh> so you don't have all of revs/ that current claims exist
<stsp> then I need to unwedge it
<stsp> by decrementing current
<danielsh> right
<danielsh> and hopefully you haven't just crossed a packing boundary
<danielsh> eg if you want to decrement from 1002 to 999
<danielsh> and someone packed it already
<danielsh> a bit more work
So in the event that 'current' says you are at rN but the rev data in the
repository is still at r'N-1', the repository will complain (I've tried
that, "No such revision rN"), and you'll need to decrement the counter
in 'current'. But otherwise, the repository will continue to work.
Now, how does rsync, or a file-system snapshot, know to make sure that
'current' is always copied first? Even if you copy 'current' first manually,
rsync might later overwrite it. But unless you use packing it's trivial to
fix the backup if it breaks, and all you risk is losing the most recent HEAD
revision, which you may not have gotten with a hotcopy anyway.
Still, I think I'll keep advising people to use hotcopy.
It avoids the problem with a too recent 'current' file, i.e. the backup
is always usable out of the box. And who knows how Subversion's on-disk
formats will at change in the future.
The hotcopy approach will always be supported, and works fine if, as you
pointed out, you can make sure that a hotcopy is being backup up while
not being written to.
Stefan
Received on 2010-08-02 22:28:25 CEST