[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: packing race condition

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Mon, 5 Jan 2009 20:55:59 +0200 (Jerusalem Standard Time)

Hyrum K. Wright wrote on Mon, 5 Jan 2009 at 10:15 -0600:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Added the use of the write lock in r35031. Comments welcome.
>

r35031 uses acquire_fs_mutex() and doesn't use svn_fs_fs__with_write_lock().
Now, maybe it's a stupid question, but shouldn't it be the other way around?

Daniel

> - -Hyrum
>
> Hyrum K. Wright wrote:
> > It doesn't use the write lock currently, but modifying it to do so would
> > probably solve the problem. When I originally wrote packing, I intended it to
> > be single-threaded, and if somebody attempted to run two instances in parallel,
> > well, Don't Do That. Using the write lock won't let 'svnadmin pack' run any
> > quicker, (i.e., the process isn't parallelized between two concurrent
> > instances), but it would prevent the race condition Daniel describes.
> >
> > When we were chatting on IRC the other day, I wasn't sure if upgrade used the
> > write lock, or a read-write lock (does FSFS even have such a thing?) and I
> > didn't have time to check. If we can get away with using the write-lock, which
> > I believe we can, it seems to be the most maintainable solution going forward.
> >
> > As an aside, I've been thinking that letting pack notify of progress would be
> > useful, especially for the big repositories on which it was intended to run.
> > Thoughts?
> >
> > -Hyrum
> >
> > David Glasser wrote:
> >> You mean svnadmin pack doesn't just work within the FSFS write lock? That
> >> would seem logical to me... fits with svnadmin upgrade, eg.
> >
> >> --dave
> >
> >> On Jan 5, 2009 12:22 AM, "Daniel Shahaf" <d.s_at_daniel.shahaf.name> wrote:
> >
> >> Problem:
> >> Race condition in 'svnadmin pack' may cause corruption (data loss).
> >
> >> Details:
> >> See pack_shard() in fs_fs.c. Note the two svn_io_remove_dir2() calls.
> >> Suppose two 'svnadmin pack' run in parallel, and that one of them
> >> reaches the first call (which removes the pack dir) just after the first
> >> of them reaches the second call (which removes the non-packed shard dir).
> >
> >> Result:
> >> The whole shard (that the two 'svnadmin pack' worked on) is physically
> >> deleted from the (OS) filesystem. Effectively, it does:
> >
> >> % rm -rf db/revs/5/
> >> % rm -rf db/revs/5.pack/
> >
> >> and then (one of the svnadmin's) throws an error (because it tries to
> >> pack the shard that had just been removed).
> >
> >> Proof:
> >> I can reproduce this (on a testing shard_size=4 repository) by placing
> >> breakpoints in the function around the remove_dir2() calls.
> >
> >> (s/place breakpoints/call svn_cmdline_prompt_user2/)
> >
> >> Fix:
> >> Hyrum, who identified the problem on IRC, also suggested the solution of
> >> adding a lock file to 'svnadmin pack'. That looks doable to me, although
> >> it should require passing an FS object (svn_fs_t) to svn_fs_fs__pack (so
> >> there is a mutex to pass to with_some_lock()).
> >
> >> I intend to look into a patch along these lines, but posting here for
> >> comments on the approach.
> >
> >> Daniel
> >
> >> (For example, another way might be to abort if '5.pack' already exists
> >> (rather than remove it and proceed). But would that work?)
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.8 (Darwin)
>
> iEYEARECAAYFAkliMiEACgkQCwOubk4kUXzwxQCgrsZeZU9mo4ynGTlmnQs8G/Yk
> 0ekAnjJzkHomUbZU6XakJ2H2tELldmzz
> =szwS
> -----END PGP SIGNATURE-----
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1005881
Received on 2009-01-05 20:07:34 CET

This is an archived mail posted to the Subversion Dev mailing list.