Hi,
We've run into a scenario where we see close to 100% CPU usage by
svnserve.exe after upgrading to subversion 1.8.5 (coming from 1.4.x) the
service never appears to come back (30 minute wall time) resulting is us
forcing a restart of the service.
The environment is a Windows 2003 SP2 (x86) Machine, we are using the
binaries provided by the Win32SVN project on SourceForge (http://sourceforge.net/projects/win32svn/
specifically svn-win32-1.8.5.zip as found under /1.8.5/apache22) however
we have also seen similar behavior when using the CollabNet Subversion
Server binaries (CollabNetSubversion-server-1.8.5-1-Win32). The machine
is a Hyper-V Virtual which has 3GB of RAM Allocated to it (PAE enabled)
with an average of 1.2GB in Use, the host CPU is a Xeon X5660 with 2
Processors exposed to this virtual machine.
The machine hosts several subversion repositories using multiple
instances of svnserve running on different ports, we have experienced
this issue on only two of these instances, however these two instances
are the most active repositories in terms of build servers and
developers querying them.
The Subversion Repository that is experiencing the issue most
frequently is a FSFS Repository with ~76400 Revisions that includes
several release branches, feature branches, and even full blown trunk
lines (the predecessor designed the system to try and encompass our
entire product, spanning 5 major versions). Each branch is approximately
75,000 Files, and 10,000 folders, the folder structure is very deep as
well. A working copy weighs in at around 2.5gb. The Repository itself
sits at around 10GB. Approximately 100 Developers are authorized to
access the system, (25 of which are our most active) committing at
various levels. Currently it averages around 50-75 commits per day
during the work week (more as the branch approaches).
The symptoms will start with several of our continuous integration
build servers failing (due to Subversion Queries timing out), logging
onto the Subversion Server and opening up Process Explorer will show
~48-50% CPU Usage for svnserve (because this machine is dual core this
indicates that we're killing a single processor). I have captured
several minidumps of the process when it is in this state and can
provide them upon request.
The Repository does have several complex pre/post-commit hooks,
ideally I would have disabled these to validate that these are not at
fault. However because this is happening sporadically and only in
production this is not feasible to do so (I apologize). While I do not
believe they are the culprit, in the spirit of full disclosure they are
C# programs, the only area where I would express concern is that these
scripts do use svnlook.exe to grab information about files within the
repository. More information can be provided if needed.
The Win32SVN Project provides PDBs as a part of their drop. They were
the only project I could find that publically does so, as an aside if
any of the other projects do so I'd love to hear from them, as noted
above I was able to repo it with the CollabNet binaries as well, and
would be happy to try anyone else's drop. However the PDB's are
mismatched (or so claims Visual Studio 2012). WinDBG is able to load the
symbols if you set ".symopt+0x40" to force it to do so. I believe I have
found a common thread between each of these dumps, each time the process
is locked we seem to be stuck in a call stack that looks similar to:
libsvn_subr_1!ensure_data_insertable+0x43 [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\libsvn_subr\cache-membuffer.c @ 1020]
libsvn_subr_1!membuffer_cache_set_internal+0xae [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\libsvn_subr\cache-membuffer.c @ 1400]
libsvn_subr_1!membuffer_cache_set+0xbf [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\libsvn_subr\cache-membuffer.c @ 1483]
libsvn_subr_1!svn_membuffer_cache_set+0x57 [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\libsvn_subr\cache-membuffer.c @ 2017]
libsvn_subr_1!svn_cache__set+0x30 [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\libsvn_subr\cache.c @ 110]
libsvn_fs_1!svn_fs_fs__rep_contents_dir+0x97 [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\libsvn_fs_fs\fs_fs.c @ 5685]
libsvn_fs_1!svn_fs_fs__rep_contents_dir_entry+0x77 [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\libsvn_fs_fs\fs_fs.c @ 5728]
libsvn_fs_1!svn_fs_fs__dag_dir_entry+0x5b [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\libsvn_fs_fs\dag.c @ 446]
libsvn_fs_1!dir_entry_id_from_node+0x19 [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\libsvn_fs_fs\dag.c @ 312]
libsvn_fs_1!svn_fs_fs__dag_open+0x1c [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\libsvn_fs_fs\dag.c @ 1160]
libsvn_fs_1!open_path+0x163 [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\libsvn_fs_fs\tree.c @ 1006]
libsvn_fs_1!get_dag+0x83 [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\libsvn_fs_fs\tree.c @ 1215]
libsvn_fs_1!fs_file_length+0x1c [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\libsvn_fs_fs\tree.c @ 2627]
libsvn_fs_1!svn_fs_file_length+0x1a [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\libsvn_fs\fs-loader.c @ 1182]
svnserve!get_dir+0x51a [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\svnserve\serve.c @ 1746]
svnserve!svn_ra_svn__handle_commands2+0xa7 [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\libsvn_ra_svn\marshal.c @ 1494]
svnserve!serve+0x53b [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\svnserve\serve.c @ 3678]
svnserve!serve_thread+0x18 [C:\Win32Svn\SVN-1.8.5-22\src-1.8.5\subversion\svnserve\svnserve.c @ 425]
libapr_1!dummy_worker+0x20 [C:\Win32Svn\SVN-1.8.5-22\httpd-2.2.25\srclib\apr\threadproc\win32\thread.c @ 79]
msvcrt!_endthreadex+0xa3
kernel32!BaseThreadStart+0x34
I have searched the mailing lists and have not found anyone reporting
anything that sounds similar to this. There seems to be some review of
the commit back in 2011, but no issues reported with it.
Again I can provide each of the dumps should anyone want to look at
this for themselves, I will be the first to admit that I am not strong
in WinDBG. I ignored the main listening thread (which WinDBG wants to
blame) as well as several other threads which appear to be other
subversion requests that are hanging.
The similarities all are in the call to membuffer_cache_set_internal
followed by ensure_data_insertable, looking at code it appears there is
a lot of memory manipulation going on in there, and specifically the
area where it "hangs" is a while(1) loop. There are comments in there
that state that the loop WILL eventually terminate, providing the
rational for it doing so. I am not well-versed enough in C (my
background is C#) to see if anything pops out immediately, but I will
continue research on my end. The code in question has been there since
2010, there are some code changes from 9/2012 but again my inexperience
does not make anything jump out at me immediately.
Alternatively I am open to alternate suggestions such as possible
memory corruption, the fact that these errors are seemly random would
lend credence to this theory, but we have not had any other (reported)
issues with this virtual host.
I apologize for the length of this email, but I hope I have provided
enough information necessary to anyone willing to assist in debugging,
Thank you for your time.
Thank you,
Ace Olszowka
Build Master
Computers Unlimited
aceo_at_cu.net
Received on 2014-01-27 21:45:44 CET