Hi Rick,
On Mon, Aug 13, 2007 at 01:59:24PM -0700, Rick Jones wrote:
> It would seem I've had a few cases where 1.4.2 will attempt to lstat a
> directory after it has rmdir'ed that directory. These seem to correlate
> completely with error messages such as:
>
> raj@tardy:~/netperf2_trunk$ svn commit -m "initial stab at measure CPU but
> only confidence on result change
> > "
> Sending src/netlib.c
> Sending src/netsh.c
> Sending src/netsh.h
> Sending src/nettest_bsd.c
> Transmitting file data ....svn: Commit failed (details follow):
> svn: MERGE request failed on '/svn/netperf2/trunk/src'
> svn: Can't read directory '/svn/netperf2/db/transactions/127-1.txn': Partial
> results are valid but processing is incomplete
>
> reported by the client (also 1.4.2) on a commit. In this case the server is
> PA-RISC Debian "testing" and the client is x86 Debian testing. I have
> tried to compile the 1.4.4 bits from unstable, but on PA-RISC that Debian
> package will not compile. Of course I've no idea if what I'm seeing is
> something already fixed in 1.4.4 - my perusal of the release notes, while
> providing some intriguing entries, found nothing that appeared to be an
> exact match.
>
Preethi (cc'd) sent me an strace off-list, and David Anderson and I took
a look at it this morning.
So, the main problem is this part of the strace. This is right at the
end of the commit process, just as we're removing the old transaction
directory:
getdents(13, {{d_ino=1355922, d_off=12, d_reclen=12, d_name="."}
{d_ino=1254356, d_off=24, d_reclen=16, d_name=".."} {d_ino=1355964,
d_off=52, d_reclen=20, d_name="node.0.0"} {d_ino=1356276, d_off=68,
d_reclen=20, d_name="rev-lock"} {d_ino=1356278, d_off=84, d_reclen=20,
d_name="changes"} {d_ino=1356280, d_off=116, d_reclen=20,
d_name="next-ids"} {d_ino=1356282, d_off=132, d_reclen=20,
d_name="node.3.0"} {d_ino=1356283, d_off=160, d_reclen=28,
d_name="node.0.0.children"} {d_ino=1356284, d_off=180, d_reclen=20,
d_name="node.1t.0"} {d_ino=1356285, d_off=4096, d_reclen=28,
d_name="node.3.0.children"}}, 4096) = 204
lstat64("/svn/netperf2/db/transactions/131-1.txn/.", ...) = 0
lstat64("/svn/netperf2/db/transactions/131-1.txn/..", ...) = 0
lstat64("/svn/netperf2/db/transactions/131-1.txn/node.0.0", ...) = 0
unlink("/svn/netperf2/db/transactions/131-1.txn/node.0.0") = 0
[ ditto for rev-lock, changes, next-ids, node-3.0 ]
lstat64("/svn/netperf2/db/transactions/131-1.txn/node.0.0.children",
...) = 0
unlink("/svn/netperf2/db/transactions/131-1.txn/node.0.0.children") = 0
lstat64("/svn/netperf2/db/transactions/131-1.txn/node.1t.0children",
0x419dbe88) = -1 ENOENT (No such file or directory)
That last filename should be "node.1t.0", as you can see from the
results of the getdents() syscall. Since APR just concatenates the
dirname and dentry name it got from readdir() and passes it to lstat(),
I'm failing to see where the problem could be coming from - unless
there's a bug in PA-RISC's APR or libc.
After we get APR_INCOMPLETE from apr_read_dir(), we return from the
commit with that error, abort the transaction (which successfully -- at
least, in the strace I saw -- deletes all the remaining files and the
dir) and finally we return the APR_INCOMPLETE error back to the client.
The client responds with a DELETE of the activity, which is the part
that's doing the final lstat() of the (now non-existent) transaction
directory -- but that error is expected and ignored.
I'm a bit stuck at this point, but there are some things that it would
be useful to find out:
- Server's distro, libc version, apr version, subversion version
(and whether each is a vanilla or distro-provided version).
- Which apache modules the server is running.
- Whether the error is consistently repeatable, and whether there's any
pattern to the failures, or whether everyone's seeing them.
Regards,
Malcolm
- application/pgp-signature attachment: stored
Received on Fri Aug 17 19:48:43 2007