--- Philip Martin <philip@codematters.co.uk> wrote:
> root <ebay_101011_0x2b@yahoo.com> writes:
>
> > I've been trying to track down an svnserve hang
> issue
> > for some time. The details are below:
> >
> > The problem is that a client checkout over the
> network
> > (via svn: svn+ssh: or http: or https:) has the
> problem
> > that for large repositories it eventually hangs on
> > checkout.
>
> Try running the client on the server, i.e. use
> svn://localhost/ or
> http://localhost/. That should help to determine if
> it is a network
> problem.
file:/// checkout works fine. this only happens with
the network whether it be any of the following
combinations:
https://, svn+ssh://, svn:// with either Win32 CMD
line svn binary or TortoiseSVN binary.
>
> > imports and checkins work fine. the system
> > details and version numbers of all the software
> are in
> > a previous email to the list under the same
> subject
> > heading.
>
> You keep breaking the email threads, which makes it
> awkward when you
> refer to "a previous email".
sorry - I will try to keep a thread.
>
> > A recent trial run where the problem
> > happened is logged below.
> >
> > My biggest sticking point is figuring out where
> the
> > code jumps to in the gdb backtrace. It seems to
> jump
> > off to 0xfffe002 -- where debug symbols do not
> exist;
> > It is not apr code.
>
> The calling stack frame appears to make an apr call,
> and apr is what
> calls read/write shown in the strace output.
>
> > I completely rebuilt apache and
> > apr included with apache with debug symbols.
>
> Are you sure you built/installed it correctly?
yes, I am positive. ldd on svnserve binary shows
which libs it uses and it is pointing to the apr
shared objects that were recently built. I know this
because I did rm -rf /usr/local/apache2 beforehand,
then did make install for apache..and noted the output
from ldd svnserve shows it is using
/usr/local/apache2/..
ldd /usr/local/bin/svnserve.bin
libsvn_repos-1.so.0 =>
/usr/local/lib/libsvn_repos-1.so.0 (0x40017000)
libsvn_fs-1.so.0 =>
/usr/local/lib/libsvn_fs-1.so.0 (0x4002a000)
libsvn_delta-1.so.0 =>
/usr/local/lib/libsvn_delta-1.so.0 (0x4004b000)
libsvn_subr-1.so.0 =>
/usr/local/lib/libsvn_subr-1.so.0 (0x40053000)
libsvn_ra_svn-1.so.0 =>
/usr/local/lib/libsvn_ra_svn-1.so.0 (0x40074000)
libaprutil-0.so.0 =>
/usr/local/apache2/lib/libaprutil-0.so.0 (0x40082000)
libgdbm.so.2 => /usr/lib/libgdbm.so.2
(0x400a9000)
libdb-4.0.so => /lib/libdb-4.0.so (0x400b0000)
libexpat.so.0 => /usr/lib/libexpat.so.0
(0x40158000)
libapr-0.so.0 =>
/usr/local/apache2/lib/libapr-0.so.0 (0x40178000)
librt.so.1 => /lib/tls/librt.so.1 (0x40197000)
libm.so.6 => /lib/tls/libm.so.6 (0x401a9000)
libcrypt.so.1 => /lib/libcrypt.so.1
(0x401cc000)
libnsl.so.1 => /lib/libnsl.so.1 (0x401f9000)
libpthread.so.0 => /lib/tls/libpthread.so.0
(0x4020e000)
libdl.so.2 => /lib/libdl.so.2 (0x4021c000)
libc.so.6 => /lib/tls/libc.so.6 (0x42000000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2
(0x40000000)
>
> > (gdb) bt
> > #0 0xffffe002 in ?? ()
> > #1 0x4007da8c in writebuf_output (conn=0x8055418,
> > pool=0x80b9cf8,
> > data=0x404824d7
> > "Ð\t¯\206¶`\234áLeûíÙðWJ,69gÖ\214k[.G\003ë
> >
>
Ì\232²\023\217çMX%ÿÎt?%\036]-\020\236z``¥\201\206úà@\020i\017:ÒK\a-\235¬C3k@Î\034*DÝ\207¡¬#S´\230EkÝ®\e%üx\030\205±_\031\nÝY\002%^ãÂã£\215|\023h\021ÔQÈ\225}å£fÊw·\213ç\220µÐ\210\200Ô¢sJ",
> >
> > len=98420) at
> subversion/libsvn_ra_svn/marshal.c:157
>
> Look at marshal.c:157, it calls apr_file_write. Do
> you get something
> like
>
> (gdb) p apr_file_write
> $1 = {apr_status_t (apr_file_t *, const void *,
> apr_size_t *)} 0x10b50 <apr_file_write>
>
> or does gdb complain about "no debug info"?
nope, it has debug info, but that is not what its
calling: here is my output:
(gdb) p apr_file_write
$1 = {apr_status_t (apr_file_t *, const void *,
apr_size_t *)} 0x40187e98 <apr_file_write>
(gdb) p apr_file_read
$2 = {apr_status_t (apr_file_t *, void *, apr_size_t
*)} 0x40187bc8 <apr_file_read>
(gdb) p readbuf_input
$3 = {svn_error_t *(svn_ra_svn_conn_t *, char *,
apr_size_t *)} 0x4007dd52 <readbuf_input>
it always jumps to 0xffffe002 -- it may not even be
valid which is why it hangs - it is not even close to
the range of addresses where the apr calls live, so
either it is some other shared library or it is
jumping off into the weeds for some reason? How can I
figure out if that is library code, which library it
is, so that I can build it with debug symbols and
breakpoint there? Also note that 0xffffe002 is no
where near the range of addresses for any of hte
shared objects output from the ldd command
above...what could lead it to leap off a plank here?
on a sidenote -- would ntp (network time protocol)
affect subversion in any way? I do have ntp running,
but I do not think that this is the problem even if a
time update did occur during a checkout -- simply
because this problem is 100% repeatable at the same
point in any repository. That is, if I do a checkout
of reposA it hangs at the same file every time. If I
do a checkout of reposB it hangs at the same file
every time...etc. Also, it seems to only be hanging
on binaries -- PDF files, JPG files, compressed HTML
binaries, etc. Note that it does not ALWAYS hang on
every binary file -- it will successfully complete
checkout of a number of JPGs or PDFs, etc., and then
hang at the same place every time.
>
> You still haven't shown any gdb/strace output for
> the client, so I can
> only guess what the client is doing, I assume it's
> blocked in a read.
Sorry - client is Win32..I can try to rebuild the
client binaries. For now, I was just using the latest
builds from tigris.org for CMD line svn and
TortoiseSVN. I will try to build the clients.
> If that's the case then the client is blocked in a
> read and the server
> is blocked in a write. I don't know what to suggest
> other than a
> network problem, particularly since you say it
> affects both http://
> and svn://.
I have a hard time swallowing it being just a network
problem. SSH and generally any protocol I run on the
LAN here seems to have no problems (SAMBA, NFS, FTP).
I have never seen any such hang problems with other
applications. The socket is still connected...I just
need to figure out where 0xffffe002 goes?
>
> --
> Philip Martin
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free web site building tool. Try it!
http://webhosting.yahoo.com/ps/sb/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Feb 3 06:23:21 2004