Stefan.Fuhrmann@etas.de wrote:
> Hi Hyrum,
>
> may I jump into the discussion.
Anytime. I'm very much interested in additional input.
> Background: For TortoiseSVN, I implemented log caching (due to release
> with version 1.5.0) based on the 1.4.x API. The current implementation
> is quite capable of dealing with really large histories of one million
> revisions and more. The KDE.org and Apache.org repositories are used
> for benchmarking. If those are handled with reasonable time and space
> requirements, all others should be fine as well.
>
> It seems reasonable to include merge information into the log. For
> instance, it should show up in the revision graph. To do that, it seems
> imperative to keep the amount of data transmitted O(#revs).
Let me see if I can better understand the problem you are trying to
solve. You would like to include merge information in log messages, but
don't want to fetch child log messages. Basically some means of finding
out which revisions were merged into a branch in a given revision. Is
this correct?
If so, I think that's a valid use case, but I'm not sure that 'svn log
-g' is the right place to put it.
> on 2007-06-03, Hyrum K. Wright <hyrum_wright_at_mail.utexas.edu> wrote:
>
>> Currently, there isn't a whole lot of path-based filtering going on with
>
>> the merged revisions. If the mergeinfo has something like '/trunk:1-9',
>> you'll get all nine revisions, 1-9, as child revisions. And since every
>> copy starts out with something like '/copysource:1-rev_before_copy', we
>> end up pulling back *a lot* of data. That data will need to be
>> filtered, not based upon the destination path, but upon the merge source
>
>> path.
>
>> Another reason for the large volume of data, is that by running the
>> command on the root of the repository, you're going to get multiple
>> copies of some log messages, both the original message, as well as the
>> copies of the message pulled in as the result of a merge. (See
>> http://svn.haxx.se/dev/archive-2007-05/0446.shtml for a previous mail on
>
>> the issue.) This is exacerbated by the fact that we're already pulling
>> in extra revisions already due to the first problem.
>
>> I'll brainstorm about these issues and try to get a workable solution
>> sometime next week.
>
> Here is what I think. My concern is basically with the svn_client_log4
> client API.
>
> * Instead of the include_merged_revisions parameter there should be
> a merged_revisions_depth parameter. Valid values whould be 0, 1
> and -1 (i.e. unlimited). Value 1 would return the list of child
> revisions immediately merged into the respective parent revision.
>
> * Introduce a merged_revision_limit parameter. If not 0, it restricts
> the size of the merged revision sub-tree in the following way.
>
> merged_revision_count = 0
> revisions_to_report.push (start_revision)
> while (revisions_to_report.count > 0)
> revision = revisions_to_report.pop
> transmit_info (revision)
>
> if (merged_revision_count < merged_revision_limit)
> sub_revisions = revisions_merged_into (revision)
> revisions_to_report.append (sub_revisions)
> merged_revision_count += sub_revisions.count
>
> Hence, for every node either all or none of its children is reported.
> Rationale: the list of merge-inputs of a given revision is part of
> that revision just like the list of changed paths.
>
> This parameter may also be used to replace merged_revisions_depth:
> only -1 requires a special check. Due to the exponential growth of
> the tree, the number of nodes may exceed the range of a 32 bit counter.
>
> * Report not only the merged revision(s) but also the path that has
> been merged. TSVN would use that information to draw the revision
> graph *for a given path*. Btw, that would be enough the reconstruct
> the content of svn:mergeinfo.
>
> For maximum efficency, I propose the following API change. Add a
> apr_hash_t *changed_revisions to svn_log_entry_t with an structure
> analogous to svn_log_changed_path_t:
>
> typedef struct svn_log_merged_revision_t
> {
> const char *mergedfrom_path;
> svn_revnum_t mergedfrom_rev;
> } svn_log_merged_revision_t;
>
> We could introduce discover_merged_revisions to control this new
> member in a way symmetric to discover_changed_paths.
>
> If set, it would SVN let fetch all direct merges, even if
> merged_revision_limit
> is 0. Likewise, discover_merged_revisions may be false while
> merged_revision_limit is not 0, causing the merged revisions
> to reported as children. Of course, merged_revision_limit>0
> and discover_merged_revisions=true is valid as well.
Hmm, this seems pretty confusing at first blush. Having multiple
options which are dependent upon each other can be confusing for API
implementors as well as users.
> In summary, my "ideal, wished-for" svn_client_log4 would look like this
>
> svn_error_t *
> svn_client_log4 (const apr_array_header_t *targets,
> const svn_opt_revision_t *peg_revision,
> const svn_opt_revision_t *start,
> const svn_opt_revision_t *end,
> int limit,
> svn_boolean_t discover_changed_paths,
> svn_boolean_t strict_node_history,
> svn_boolean_t discover_merged_revisions,
> int merged_revision_limit,
> svn_boolean_t omit_log_text,
> svn_log_message_receiver2_t receiver,
> void *receiver_baton,
> svn_client_ctx_t *ctx,
> apr_pool_t *pool);
>
> Please don't feel offended by the lengthy this-needs-to-be-changed-post.
> I will be fine with any solution that does not result in exponential
> data growth.
Your concern is a valid one and I appreciate the feedback. It should be
noted that exponential data growth only really happens when 'svn log -g'
is run on the root of a repository. If run on '/trunk' or
'/branches/1.x', you can expect that revisions will only be shown once,
because they will have only been committed to the branch once.
Just like checking out the root of a repository results in 'exponential'
data growth, running 'svn log -g' on the root of the repository will
result in the same. The solution right now is "Don't Do That".
-Hyrum
Received on Mon Jun 4 16:47:32 2007