[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: packing race condition

From: Hyrum K. Wright <hyrum_wright_at_mail.utexas.edu>
Date: Mon, 05 Jan 2009 10:15:29 -0600

Hash: SHA1

Added the use of the write lock in r35031. Comments welcome.

- -Hyrum

Hyrum K. Wright wrote:
> It doesn't use the write lock currently, but modifying it to do so would
> probably solve the problem. When I originally wrote packing, I intended it to
> be single-threaded, and if somebody attempted to run two instances in parallel,
> well, Don't Do That. Using the write lock won't let 'svnadmin pack' run any
> quicker, (i.e., the process isn't parallelized between two concurrent
> instances), but it would prevent the race condition Daniel describes.
> When we were chatting on IRC the other day, I wasn't sure if upgrade used the
> write lock, or a read-write lock (does FSFS even have such a thing?) and I
> didn't have time to check. If we can get away with using the write-lock, which
> I believe we can, it seems to be the most maintainable solution going forward.
> As an aside, I've been thinking that letting pack notify of progress would be
> useful, especially for the big repositories on which it was intended to run.
> Thoughts?
> -Hyrum
> David Glasser wrote:
>> You mean svnadmin pack doesn't just work within the FSFS write lock? That
>> would seem logical to me... fits with svnadmin upgrade, eg.
>> --dave
>> On Jan 5, 2009 12:22 AM, "Daniel Shahaf" <d.s_at_daniel.shahaf.name> wrote:
>> Problem:
>> Race condition in 'svnadmin pack' may cause corruption (data loss).
>> Details:
>> See pack_shard() in fs_fs.c. Note the two svn_io_remove_dir2() calls.
>> Suppose two 'svnadmin pack' run in parallel, and that one of them
>> reaches the first call (which removes the pack dir) just after the first
>> of them reaches the second call (which removes the non-packed shard dir).
>> Result:
>> The whole shard (that the two 'svnadmin pack' worked on) is physically
>> deleted from the (OS) filesystem. Effectively, it does:
>> % rm -rf db/revs/5/
>> % rm -rf db/revs/5.pack/
>> and then (one of the svnadmin's) throws an error (because it tries to
>> pack the shard that had just been removed).
>> Proof:
>> I can reproduce this (on a testing shard_size=4 repository) by placing
>> breakpoints in the function around the remove_dir2() calls.
>> (s/place breakpoints/call svn_cmdline_prompt_user2/)
>> Fix:
>> Hyrum, who identified the problem on IRC, also suggested the solution of
>> adding a lock file to 'svnadmin pack'. That looks doable to me, although
>> it should require passing an FS object (svn_fs_t) to svn_fs_fs__pack (so
>> there is a mutex to pass to with_some_lock()).
>> I intend to look into a patch along these lines, but posting here for
>> comments on the approach.
>> Daniel
>> (For example, another way might be to abort if '5.pack' already exists
>> (rather than remove it and proceed). But would that work?)
Version: GnuPG v1.4.8 (Darwin)


Received on 2009-01-05 17:15:54 CET

This is an archived mail posted to the Subversion Dev mailing list.