Hi Stefan,
Thanks for the constructive review! I accept your nits, but have one comment:
> You can use svn_sort_compare_items_lexically because entries are single path
> components that do not contain slashes. In which case
> svn_sort_compare_items_as_paths produces the same result but is less
> efficient.
The paths are like:
trunk/documents/papers/archetypes/paper_plots/103lg.ps
But I still believe you are right that lexical sort will work fine and is
simpler than svn_sort_compare_items_as_paths.
The revised patch is attached. Here's a proposed commit message:
[[[
Issue #4134: svnadmin dump files are not reproducible, due to APR 1.4.6 hash order changes
This patch sorts deleted path names before dumping them.
* subversion/libsvn_repos/dump.c
(close_directory): sort hash deleted_entries before dumping.
Patch by: Dustin Lang <dstndstn_at_gmail.com>
Review by: stsp
(Stefan suggested several stylistic changes and using
svn_sort_compare_items_lexically.)
]]]
cheers,
dstn
On Tue, 29 May 2012, Stefan Sperling wrote:
> On Tue, May 29, 2012 at 03:09:08PM -0400, Dustin Lang wrote:
>>
>> Hi,
>>
>> This is my first time working on the subversion code, so be gentle :)
>>
>> I believe the attached patch solves one component of #4134
>> ("svnadmin dump files are not reproducible, due to APR 1.4.6 hash
>> order changes"), namely that deleted nodes are stored in a hash and
>> dumped out in (now-arbitrary) hash iteration order. This patch
>> simply uses "svn_sort_hash" to get a sorted view of the keys before
>> dumping them.
>>
>> I have verified that it runs valgrind-clean when "svnadmin dump"ing
>>> 700 revs of the astrometry.net repo (and now my repo and its
>> svnsync'd mirror produce exactly equal dumps, hoorah), but I don't
>> really know if I've used the apr infrastructure appropriately.
>>
>> I don't know whether this completely solves #4134; it works for me,
>> but my repo is simple.
>>
>> cheers,
>> dustin
>>
>
> This patch looks correct to me overall but I have some nits.
>
>> http://subversion.tigris.org/issues/show_bug.cgi?id=4134
>
>> Index: subversion/libsvn_repos/dump.c
>> ===================================================================
>> --- subversion/libsvn_repos/dump.c (revision 1343881)
>> +++ subversion/libsvn_repos/dump.c (working copy)
>> @@ -34,6 +34,7 @@
>> #include "svn_time.h"
>> #include "svn_checksum.h"
>> #include "svn_props.h"
>> +#include "svn_sorts.h"
>>
>> #include "private/svn_mergeinfo_private.h"
>> #include "private/svn_fs_private.h"
>> @@ -736,12 +737,14 @@
>> struct edit_baton *eb = db->edit_baton;
>> apr_hash_index_t *hi;
>> apr_pool_t *subpool = svn_pool_create(pool);
>> + unsigned int Npaths = apr_hash_count(db->deleted_entries);
>
> No CamelCase please. You don't need this variable at all (see below).
>
>> + unsigned int i;
>> + apr_array_header_t* sorted;
>
> Maybe call the array 'sorted_deleted_entries', or 'sorted_entries', instead?
>
>>
>> - for (hi = apr_hash_first(pool, db->deleted_entries);
>> - hi;
>> - hi = apr_hash_next(hi))
>> + sorted = svn_sort__hash(db->deleted_entries, svn_sort_compare_items_as_paths, pool);
>
> Please try to wrap lines at 78 columns.
>
> You can use svn_sort_compare_items_lexically because entries are single
> path components that do not contain slashes. In which case
> svn_sort_compare_items_as_paths produces the same result but is less efficient.
>
>> + for (i = 0; i < Npaths; i++)
>
> Here, you can use the idiom:
>
> for (i = 0; i < sorted->nelts; i++)
>
> which means you don't need 'Npaths'.
>
>> {
>> - const char *path = svn__apr_hash_index_key(hi);
>> + const char* path = APR_ARRAY_IDX(sorted, i, svn_sort__item_t).key;
>
> Please put the asterisk next to the variable name for consistency with code
> formatting elsewhere. Same applies to the declaration of 'sorted' above.
>
>>
>> svn_pool_clear(subpool);
>>
>> @@ -753,6 +756,8 @@
>> FALSE, NULL, SVN_INVALID_REVNUM, subpool));
>> }
>>
>> + apr_array_clear(sorted);
>
> No need to do this. This doesn't actually free any memory, it just marks
> array slots for reuse. Array memory will be freed when the caller clears
> or destroys 'pool'. It's up to the caller to do this, so just delete that line.
>
>> +
>> svn_pool_destroy(subpool);
>> return SVN_NO_ERROR;
>> }
>
>
Received on 2012-05-29 23:01:47 CEST