[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r36663 - trunk/subversion/libsvn_wc

From: Greg Stein <gstein_at_gmail.com>
Date: Thu, 19 Mar 2009 10:10:17 +0100

On Thu, Mar 19, 2009 at 04:19, Hyrum K. Wright
<hyrum_wright_at_mail.utexas.edu> wrote:
> On Mar 18, 2009, at 9:18 PM, Greg Stein wrote:
>> On Wed, Mar 18, 2009 at 22:23, Hyrum K. Wright <hyrum_at_hyrumwright.org>
>> wrote:
>>> ...
>>> +++ trunk/subversion/libsvn_wc/lock.c   Wed Mar 18 14:23:56 2009
>>>  (r36663)
>>> @@ -85,11 +85,9 @@ struct svn_wc_adm_access_t
>>>  /* SET_OWNER is TRUE if SET is allocated from this access baton */
>>>  svn_boolean_t set_owner;
>>> -  /* ENTRIES is the cached entries for PATH, without those in state
>>> -     deleted. ENTRIES_HIDDEN is the cached entries including those in
>>> -     state deleted or state absent. Either may be NULL. */
>>> -  apr_hash_t *entries;
>>> -  apr_hash_t *entries_hidden;
>>> +  /* ENTRIES_HIDDEN is all cached entries including those in
>>> +     state deleted or state absent. It may be NULL. */
>>> +  apr_hash_t *entries_all;
>> Comment is wrong. Should ref ENTRIES_ALL.
>>> ...
>>> +static apr_hash_t *
>>>  prune_deleted(svn_wc_adm_access_t *adm_access,
>>> -              apr_pool_t *pool)
>>> +              apr_pool_t *scratch_pool)
>>>  {
>>> ...
>>> +  /* Construct pruned hash without deleted entries */
>>> +  entries = apr_hash_make(adm_access->pool);
>> And... BOOM. Every single time some code calls
>> svn_wc_entries_read(show_hidden=FALSE) this will drop more memory into
>> adm_access->pool. And that fact is a documented part of the API.
>> Callers have no incentive to bother caching the value since it is
>> "known" that multiple calls will not grow that memory pool.
>> To maintain backwards compatibility, you have to continue using that
>> pool (not the passed-in pool which is a scratch_pool, and therefore
>> probably not of the correct lifetime).
>> I honestly do not see how you're going to get around this.
>> Now... to some extent if API users are making modifications, then the
>> old entries will get tossed and the new entries will take their place.
>> Memory in that pool will grow as a function of *writes* rather than
>> simply *reads*.
>> I sympathize, and agree that the old API/behavior/definition blows.
>> But I don't see much wiggly room here. Do you have some ideas that I'm
>> missing?
> First let me say that the API completely blows.  The fact that we have an
> *external* API which is so implementation dependent is mind-boggling.  The
> entries cache is documented as just that...a cache.  But the behavior we are
> discovering is anything but cache-like, unfortunately.  Why we publicly
> document the usage of opaque members of a private struct is beyond me.
>  </rant>

Oh, agreed. And very easy to see in hindsight.

> A few more questions:
>  * Does this have any known effects outside of libsvn_wc?  Could it be
> possible that this is a theoretical possibility, but of practical nonimport?
>  (A little part of me hopes so, but doubts it.)

svn_wc_entries_read() is used within libsvn_wc (42 uses) and 5 calls
in libsvn_client. The calls in client are safe, per my other email.

Calls by third-party libraries should not be altering those entry
structures or the hash itself, per the docstring. As a result, I'm not
worried about any third-party code breaking.

>  * Does this qualify as errata worthy?

No. The docstring says "do not modify". Nobody should be *relying* on
being able to do so. It is not part of our published API.

>  * Hmm, the thought just occurs to me: is there any chance we could use a
> subpool here?  Just allocate the output data in a subpool of the access
> baton pool?  ("Sure, but when would you clear it?" says the other half of my
> brain.)

Thought of that, but the doc says the returned hash will live as long
as the baton. So we can't just clear it out whenever we'd like.

One thing that I've found: when constructing the *pruned* entries
hash, it does not copy the entries themselves. Only a new hashtable is
constructed, with pointers to the original entries. I'd need to do
some more investigation to determine when/conditions for new entries
being allocated in the baton's pool. Point is: the growth of the
baton's pool as a result of multiple calls to svn_wc_entries_read()
might not be too bad *as long as* we cache/hold the hash table.

If you take your next step, and read it upon each call, then we're
going to have serious memory problems. Calls to svn_wc_entry() will
re-read an entire hash table just to fetch one entry.

I believe we must continue to cache/hold that hash table in the baton.

> I think we both feel the Right Way to do this is an API which includes a
> result pool in
> which the output entries are allocated.  Getting there from here could be
> difficult.

Well... the svn_wc_entry_t structure will be deprecated in 1.7. So the
"Right" API style is moot.

Received on 2009-03-19 10:10:36 CET

This is an archived mail posted to the Subversion Dev mailing list.