On Mon, Aug 02, 2010 at 12:42:31PM -0400, Vallon, Justin wrote:
> I did see that discussion, but it seems to contradict with the claim
> that the database operations are transactional.
This is a frequent misunderstanding of the "atomic commit" concept.
> So, my follow up is: If I unplug the network cable between
> svn-commiter and filesystem, will the repository be corrupt?
When you break the network connection, commits either pass or fail.
What can happen is that you are left with stale transactions lying
around. That's all. The critical section comes later, when the commit
is to be finalized. At that point, all data has been transmitted across
the network already.
> If yes, then the underlying database is not atomic and therefore not
No, that's not the case. Just because unplugging the network cable
does not cause havoc does not mean the underlying filesystem is
truly transactional at the lowest level.
Commits are transactional from the user's point of view.
Or you could say, they're transactional from the point of view of
every layer in Subversion *above* the filesystem layer.
If two people commit to the same path at the same time, one commit
will succeed and the other will fail. That's what atomic commits
are about. They're not about consistency of the repository filesystem.
E.g. Subversion's FSFS needs to create a revision file from the commit's
transaction, and move the finalized revision file into place.
After the revision file has been moved into place successfully, FSFS also
updates the svn:date revision property and moves the revision properties file
into place (or copies revprop data into an sqlite database if you use
revprop packing). Then, it updates the 'current' file which contains the
number of the current HEAD revision. If you use representation sharing to
save disk space, the commit may involve further updates to yet another
All these actions need to complete in order to have a consistent state.
If you're interested in seeing the code that does this, look at the
svn_fs_fs__commit() and commit_body() functions in
> If no, then the server can take a snapshot at any point in time, and
> the snapshot is guaranteed to be consistent.
The safest way is creating a hotcopy of the live repository, and taking
a copy of the hotcopy (by taking a snapshot of the filesystem the hotcopy
resides in, or whatever). It's not the most efficient way of doing it,
but the most portable and safest.
You may be lucky and always get a consistent repository state by taking
filesystem-level snapshots of a live repository, but there's no guarantee
that the very latest commit will always be in a consistent state.
Received on 2010-08-02 19:57:09 CEST