-----BEGIN PGP SIGNED MESSAGE-----
Added the use of the write lock in r35031. Comments welcome.
Hyrum K. Wright wrote:
> It doesn't use the write lock currently, but modifying it to do so would
> probably solve the problem. When I originally wrote packing, I intended it to
> be single-threaded, and if somebody attempted to run two instances in parallel,
> well, Don't Do That. Using the write lock won't let 'svnadmin pack' run any
> quicker, (i.e., the process isn't parallelized between two concurrent
> instances), but it would prevent the race condition Daniel describes.
> When we were chatting on IRC the other day, I wasn't sure if upgrade used the
> write lock, or a read-write lock (does FSFS even have such a thing?) and I
> didn't have time to check. If we can get away with using the write-lock, which
> I believe we can, it seems to be the most maintainable solution going forward.
> As an aside, I've been thinking that letting pack notify of progress would be
> useful, especially for the big repositories on which it was intended to run.
> David Glasser wrote:
>> You mean svnadmin pack doesn't just work within the FSFS write lock? That
>> would seem logical to me... fits with svnadmin upgrade, eg.
>> On Jan 5, 2009 12:22 AM, "Daniel Shahaf" <d.s_at_daniel.shahaf.name> wrote:
>> Race condition in 'svnadmin pack' may cause corruption (data loss).
>> See pack_shard() in fs_fs.c. Note the two svn_io_remove_dir2() calls.
>> Suppose two 'svnadmin pack' run in parallel, and that one of them
>> reaches the first call (which removes the pack dir) just after the first
>> of them reaches the second call (which removes the non-packed shard dir).
>> The whole shard (that the two 'svnadmin pack' worked on) is physically
>> deleted from the (OS) filesystem. Effectively, it does:
>> % rm -rf db/revs/5/
>> % rm -rf db/revs/5.pack/
>> and then (one of the svnadmin's) throws an error (because it tries to
>> pack the shard that had just been removed).
>> I can reproduce this (on a testing shard_size=4 repository) by placing
>> breakpoints in the function around the remove_dir2() calls.
>> (s/place breakpoints/call svn_cmdline_prompt_user2/)
>> Hyrum, who identified the problem on IRC, also suggested the solution of
>> adding a lock file to 'svnadmin pack'. That looks doable to me, although
>> it should require passing an FS object (svn_fs_t) to svn_fs_fs__pack (so
>> there is a mutex to pass to with_some_lock()).
>> I intend to look into a patch along these lines, but posting here for
>> comments on the approach.
>> (For example, another way might be to abort if '5.pack' already exists
>> (rather than remove it and proceed). But would that work?)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
-----END PGP SIGNATURE-----
Received on 2009-01-05 17:15:54 CET