[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn propget -R on the root of a large repository == boom!

From: Karl Fogel <kfogel_at_red-bean.com>
Date: Mon, 21 Apr 2008 15:47:56 -0400

"Peter Wemm" <peter_at_wemm.org> writes:
> Suppose you have a large svn repository and people who like inviting
> trouble. An example of 'large' in my case is the output (fsfs) of
> cvs2svn of the freebsd.org src repository. There are only 180K
> changes, but there are many thousands of branches and tags and copies.
> The namespace is quite large and convoluted.
>
> Running 'svn propget -R svn:author file:///path/to/repo' is a good
> way to find out how your machine handles running out of swap space...
> In my case, the svn process had reached *19GB* before something broke.

[There's a patch at the end of this mail, for you to test.]

Two things are interesting here:

   1. Seems like we're holding the accumulated results (or accumulated
      intermediate stages) of a recursive propget in memory, instead of
      streaming like one would expect, and

   2. This memory bloat happens even when there are no results, as there
      shouldn't be in your example, since you didn't pass the --revprop
      flag and "svn:author" is probably not on any files or directories.
      (IOW, there's a mistake in your command, but it happened to be a
      revealing mistake for our purposes.)

> I know there's a mod_dontdothat.so for apache, but that doesn't do
> much for local access or svnserve. (I'm thinking of having the setgid
> svnserve wrapper lower the resource limits as a mitigation.)
>
> Just FYI. This is svn-1.5 as of a checkout from the 1.5 branch on
> friday sometime. svnversion says r30712.

So, I think the problem is in

https://svn.collab.net/repos/svn/trunk/subversion/libsvn_client/prop_commands.c

in the function remote_propget(). First we call

      SVN_ERR(svn_ra_get_dir2(ra_session,
                              (depth >= svn_depth_files ? &dirents : NULL),
                              NULL, &prop_hash, target_relative, revnum,
                              SVN_DIRENT_KIND, pool));

...passing the caller's pool directly. So that fetches a bunch of props
into 'prop_hash', all of which have been allocated in 'pool'. From
those, we pick just the one we want and assign it into our returnable
'props' hash, keyed on path:

  apr_hash_set(props,
               svn_path_join(target_prefix, target_relative, pool),
               APR_HASH_KEY_STRING,
               apr_hash_get(prop_hash, propname, APR_HASH_KEY_STRING));

Note that this too uses 'pool', but here it's necessary, because we need
the property to last outside the call. (However, we could have copied
it from a subpool into the main pool.)

So far, these sins have been venial not mortal, because it was just a
one-time fetch. But now we loop recursively...

  if (depth >= svn_depth_files
      && kind == svn_node_dir
      && apr_hash_count(dirents) > 0)
    {
      apr_hash_index_t *hi;

      for (hi = apr_hash_first(pool, dirents);
           hi;
           hi = apr_hash_next(hi))
        {
           [...]
        }
    }

...and still no iteration pool! :-)

Can you let me know how the patch below works for you? It's passing
'make check' so far for me, and has gotten past prop_tests.py, so I
think I've at least avoided obvious typos :-). However, I don't have a
good dataset for reproducing the kind of bloat you experienced.

[[[
Fix a memory leak in remove recursive propget. See thread starting here:

   http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=137529
   From: "Peter Wemm" <peter_at_wemm.org>
   To: dev_at_subversion.tigris.org
   Subject: svn propget -R on the root of a large repository == boom!
   Date: Mon, 21 Apr 2008 02:12:54 -0700
   Message-ID: <e7db6d980804210212h49b2a5a2g5b99093d2ec437a7_at_mail.gmail.com>

* subversion/libsvn_client/prop_commands.c
  (remote_propget): Take perm_pool and work_pool arguments, instead of
    just one pool, and use them to manage memory more efficiently.
]]]

Index: subversion/libsvn_client/prop_commands.c
===================================================================
--- subversion/libsvn_client/prop_commands.c (revision 30740)
+++ subversion/libsvn_client/prop_commands.c (working copy)
@@ -628,7 +628,9 @@
  * KIND is the kind of the node at "TARGET_PREFIX/TARGET_RELATIVE".
  * Yes, caller passes this; it makes the recursion more efficient :-).
  *
- * Allocate the keys and values in POOL.
+ * Allocate the keys and values in PERM_POOL, but do all temporary
+ * work in WORK_POOL. The two pools can be the same; recursive
+ * calls may use a different WORK_POOL, however.
  */
 static svn_error_t *
 remote_propget(apr_hash_t *props,
@@ -639,7 +641,8 @@
                svn_revnum_t revnum,
                svn_ra_session_t *ra_session,
                svn_depth_t depth,
- apr_pool_t *pool)
+ apr_pool_t *perm_pool,
+ apr_pool_t *work_pool)
 {
   apr_hash_t *dirents;
   apr_hash_t *prop_hash;
@@ -649,41 +652,47 @@
       SVN_ERR(svn_ra_get_dir2(ra_session,
                               (depth >= svn_depth_files ? &dirents : NULL),
                               NULL, &prop_hash, target_relative, revnum,
- SVN_DIRENT_KIND, pool));
+ SVN_DIRENT_KIND, work_pool));
     }
   else if (kind == svn_node_file)
     {
       SVN_ERR(svn_ra_get_file(ra_session, target_relative, revnum,
- NULL, NULL, &prop_hash, pool));
+ NULL, NULL, &prop_hash, work_pool));
     }
   else if (kind == svn_node_none)
     {
       return svn_error_createf
         (SVN_ERR_ENTRY_NOT_FOUND, NULL,
          _("'%s' does not exist in revision %ld"),
- svn_path_join(target_prefix, target_relative, pool), revnum);
+ svn_path_join(target_prefix, target_relative, work_pool), revnum);
     }
   else
     {
       return svn_error_createf
         (SVN_ERR_NODE_UNKNOWN_KIND, NULL,
          _("Unknown node kind for '%s'"),
- svn_path_join(target_prefix, target_relative, pool));
+ svn_path_join(target_prefix, target_relative, work_pool));
     }
 
- apr_hash_set(props,
- svn_path_join(target_prefix, target_relative, pool),
- APR_HASH_KEY_STRING,
- apr_hash_get(prop_hash, propname, APR_HASH_KEY_STRING));
+ {
+ svn_string_t *val = apr_hash_get(prop_hash, propname,
+ APR_HASH_KEY_STRING);
+ if (val)
+ val = svn_string_dup(val, perm_pool);
 
+ apr_hash_set(props,
+ svn_path_join(target_prefix, target_relative, perm_pool),
+ APR_HASH_KEY_STRING, val);
+ }
 
   if (depth >= svn_depth_files
       && kind == svn_node_dir
       && apr_hash_count(dirents) > 0)
     {
       apr_hash_index_t *hi;
+ apr_pool_t *iterpool = svn_pool_create(work_pool);
 
- for (hi = apr_hash_first(pool, dirents);
+ for (hi = apr_hash_first(work_pool, dirents);
            hi;
            hi = apr_hash_next(hi))
         {
@@ -694,6 +703,8 @@
           const char *new_target_relative;
           svn_depth_t depth_below_here = depth;
 
+ svn_pool_clear(iterpool);
+
           apr_hash_this(hi, &key, NULL, &val);
           this_name = key;
           this_ent = val;
@@ -705,7 +716,7 @@
             depth_below_here = svn_depth_empty;
 
           new_target_relative = svn_path_join(target_relative,
- this_name, pool);
+ this_name, iterpool);
 
           SVN_ERR(remote_propget(props,
                                  propname,
@@ -715,8 +726,10 @@
                                  revnum,
                                  ra_session,
                                  depth_below_here,
- pool));
+ perm_pool, iterpool));
         }
+
+ svn_pool_destroy(iterpool);
     }
 
   return SVN_NO_ERROR;
@@ -866,7 +879,7 @@
 
       SVN_ERR(remote_propget(*props, propname, url, "",
                              kind, revnum, ra_session,
- depth, pool));
+ depth, pool, pool));
     }
 
   if (actual_revnum)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-04-22 08:54:49 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.