[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: New merge-sensitive log feature

From: <Stefan.Fuhrmann_at_etas.de>
Date: 2007-06-03 16:28:32 CEST

Hi Hyrum,

may I jump into the discussion.

Background: For TortoiseSVN, I implemented log caching (due to release
with version 1.5.0) based on the 1.4.x API. The current implementation
is quite capable of dealing with really large histories of one million
revisions and more. The KDE.org and Apache.org repositories are used
for benchmarking. If those are handled with reasonable time and space
requirements, all others should be fine as well.

It seems reasonable to include merge information into the log. For
instance, it should show up in the revision graph. To do that, it seems
imperative to keep the amount of data transmitted O(#revs).

on 2007-06-03, Hyrum K. Wright <hyrum_wright_at_mail.utexas.edu> wrote:

> Currently, there isn't a whole lot of path-based filtering going on with

> the merged revisions. If the mergeinfo has something like '/trunk:1-9',
> you'll get all nine revisions, 1-9, as child revisions. And since every
> copy starts out with something like '/copysource:1-rev_before_copy', we
> end up pulling back *a lot* of data. That data will need to be
> filtered, not based upon the destination path, but upon the merge source

> path.

> Another reason for the large volume of data, is that by running the
> command on the root of the repository, you're going to get multiple
> copies of some log messages, both the original message, as well as the
> copies of the message pulled in as the result of a merge. (See
> http://svn.haxx.se/dev/archive-2007-05/0446.shtml for a previous mail on

> the issue.) This is exacerbated by the fact that we're already pulling
> in extra revisions already due to the first problem.

> I'll brainstorm about these issues and try to get a workable solution
> sometime next week.

Here is what I think. My concern is basically with the svn_client_log4
client API.

* Instead of the include_merged_revisions parameter there should be
  a merged_revisions_depth parameter. Valid values whould be 0, 1
  and -1 (i.e. unlimited). Value 1 would return the list of child
  revisions immediately merged into the respective parent revision.

* Introduce a merged_revision_limit parameter. If not 0, it restricts
  the size of the merged revision sub-tree in the following way.

        merged_revision_count = 0
        revisions_to_report.push (start_revision)
        while (revisions_to_report.count > 0)
                revision = revisions_to_report.pop
                transmit_info (revision)

                if (merged_revision_count < merged_revision_limit)
                        sub_revisions = revisions_merged_into (revision)
                        revisions_to_report.append (sub_revisions)
                        merged_revision_count += sub_revisions.count
 
  Hence, for every node either all or none of its children is reported.
  Rationale: the list of merge-inputs of a given revision is part of
  that revision just like the list of changed paths.

  This parameter may also be used to replace merged_revisions_depth:
  only -1 requires a special check. Due to the exponential growth of
  the tree, the number of nodes may exceed the range of a 32 bit counter.

* Report not only the merged revision(s) but also the path that has
  been merged. TSVN would use that information to draw the revision
  graph *for a given path*. Btw, that would be enough the reconstruct
  the content of svn:mergeinfo.

  For maximum efficency, I propose the following API change. Add a
  apr_hash_t *changed_revisions to svn_log_entry_t with an structure
  analogous to svn_log_changed_path_t:

        typedef struct svn_log_merged_revision_t
        {
                const char *mergedfrom_path;
                svn_revnum_t mergedfrom_rev;
        } svn_log_merged_revision_t;

  We could introduce discover_merged_revisions to control this new
  member in a way symmetric to discover_changed_paths.

  If set, it would SVN let fetch all direct merges, even if
merged_revision_limit
  is 0. Likewise, discover_merged_revisions may be false while
  merged_revision_limit is not 0, causing the merged revisions
  to reported as children. Of course, merged_revision_limit>0
  and discover_merged_revisions=true is valid as well.

In summary, my "ideal, wished-for" svn_client_log4 would look like this

        svn_error_t *
        svn_client_log4 (const apr_array_header_t *targets,
                         const svn_opt_revision_t *peg_revision,
                         const svn_opt_revision_t *start,
                         const svn_opt_revision_t *end,
                         int limit,
                         svn_boolean_t discover_changed_paths,
                         svn_boolean_t strict_node_history,
                         svn_boolean_t discover_merged_revisions,
                         int merged_revision_limit,
                         svn_boolean_t omit_log_text,
                         svn_log_message_receiver2_t receiver,
                         void *receiver_baton,
                         svn_client_ctx_t *ctx,
                         apr_pool_t *pool);

Please don't feel offended by the lengthy this-needs-to-be-changed-post.
I will be fine with any solution that does not result in exponential
data growth.

Regards,
Stefan^2.
Received on Sun Jun 3 16:28:44 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.