[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: packing race condition

From: Hyrum K. Wright <hyrum_wright_at_mail.utexas.edu>
Date: Mon, 05 Jan 2009 07:31:38 -0600

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

It doesn't use the write lock currently, but modifying it to do so would
probably solve the problem. When I originally wrote packing, I intended it to
be single-threaded, and if somebody attempted to run two instances in parallel,
well, Don't Do That. Using the write lock won't let 'svnadmin pack' run any
quicker, (i.e., the process isn't parallelized between two concurrent
instances), but it would prevent the race condition Daniel describes.

When we were chatting on IRC the other day, I wasn't sure if upgrade used the
write lock, or a read-write lock (does FSFS even have such a thing?) and I
didn't have time to check. If we can get away with using the write-lock, which
I believe we can, it seems to be the most maintainable solution going forward.

As an aside, I've been thinking that letting pack notify of progress would be
useful, especially for the big repositories on which it was intended to run.
Thoughts?

- -Hyrum

David Glasser wrote:
> You mean svnadmin pack doesn't just work within the FSFS write lock? That
> would seem logical to me... fits with svnadmin upgrade, eg.
>
> --dave
>
> On Jan 5, 2009 12:22 AM, "Daniel Shahaf" <d.s_at_daniel.shahaf.name> wrote:
>
> Problem:
> Race condition in 'svnadmin pack' may cause corruption (data loss).
>
> Details:
> See pack_shard() in fs_fs.c. Note the two svn_io_remove_dir2() calls.
> Suppose two 'svnadmin pack' run in parallel, and that one of them
> reaches the first call (which removes the pack dir) just after the first
> of them reaches the second call (which removes the non-packed shard dir).
>
> Result:
> The whole shard (that the two 'svnadmin pack' worked on) is physically
> deleted from the (OS) filesystem. Effectively, it does:
>
> % rm -rf db/revs/5/
> % rm -rf db/revs/5.pack/
>
> and then (one of the svnadmin's) throws an error (because it tries to
> pack the shard that had just been removed).
>
> Proof:
> I can reproduce this (on a testing shard_size=4 repository) by placing
> breakpoints in the function around the remove_dir2() calls.
>
> (s/place breakpoints/call svn_cmdline_prompt_user2/)
>
> Fix:
> Hyrum, who identified the problem on IRC, also suggested the solution of
> adding a lock file to 'svnadmin pack'. That looks doable to me, although
> it should require passing an FS object (svn_fs_t) to svn_fs_fs__pack (so
> there is a mutex to pass to with_some_lock()).
>
> I intend to look into a patch along these lines, but posting here for
> comments on the approach.
>
> Daniel
>
> (For example, another way might be to abort if '5.pack' already exists
> (rather than remove it and proceed). But would that work?)
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1004476
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1004537

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkliC7kACgkQCwOubk4kUXzxuwCg2AMzpKrU00Gk3Dxd7YNqMy9M
ep4AoJ0HddavnWaeXtQtCleJhBO7uIyq
=+PGP
-----END PGP SIGNATURE-----

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1005192
Received on 2009-01-05 14:32:00 CET

This is an archived mail posted to the Subversion Dev mailing list.