[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: New merge-sensitive log feature

From: <Stefan.Fuhrmann_at_etas.de>
Date: 2007-06-05 13:27:36 CEST

"Hyrum K. Wright" <hyrum_wright@mail.utexas.edu> wrote:

> > It seems reasonable to include merge information into the log. For
> > instance, it should show up in the revision graph. To do that, it
seems
> > imperative to keep the amount of data transmitted O(#revs).

> Let me see if I can better understand the problem you are trying to
> solve. You would like to include merge information in log messages, but
> don't want to fetch child log messages. Basically some means of finding
> out which revisions were merged into a branch in a given revision. Is
> this correct?

Yes. To detail the problem:

* The way TortoiseSVN works, we always need & fetch full changed path
  information for every revision: Server rountrips are > 1 sec on the
  internet, hence fetching the detail information on demand is not an
  option for an interactive GUI.
  Today, fetching many log entries takes a considerable amount of time.
  Thus, transmitting revision info twice or more times will hurt
performance.
  Although a small factor may be acceptable, a definite upper limit for
  this repetition factor is necessary.
* Server rountrips may have a large latency. Therefore, all information
  required to display the first, say 100, revisions should be available
  through a (very) limited number of log API calls. Having a way to
  instruct svn_client_log4 to return just the right amount of data
  would be an ideal solution (for TSVN, that is).
* Merge information can be very extensive (see also below): When
  synchronizing your feature branch with the trunk, 10,000 revisions
  or may get merged into less than 100 revisions of your branch.
  So, a way to limit the total amount of revision info that will
  be transferred is needed.
* These may be conflicting goals, depending on details of the API design.

> If so, I think that's a valid use case, but I'm not sure that 'svn log
> -g' is the right place to put it.

I am not concerned with "svn log -g" as such. But I consider it as a
shallow wrapper around svn_client_log4, so to me, both seem to be
tightly coupled with respect to their capabilities / flexibility.

Moreover, there will be users that use the --xml output for further
processing. To me, it seems likely that similar restrictions apply
to those use cases as do for TSVN.

> > (long description of many possible changes to svn_client_log4)

> Hmm, this seems pretty confusing at first blush. Having multiple
> options which are dependent upon each other can be confusing for API
> implementors as well as users.

Sorry for that confusion; I was developing ideas while writing
the post. Not an ideal way to comminicate one's ideas, admittedly.

But basically, I would like to see two changes / extensions:

* Introduce discover_merged_revisions and return the merged
  (path, revision) pairs in a hash just like the changed_paths.
  Drop include_merged_revisions parameter.
* Introduce a "--limit"-like option: merged_revision_limit.
  How that restricts the merged revision tree is up to the
  API implementaion, as long as it does limit the amount to
  data retrieved.

TSVN, for instance would use both options: the first one to discover
and cache the merged revision relationship and the second to fetch
a "reasonable but limited" amount of additional information on those
merged revisions. If the user is not satisfied with the result, he
or she will push a "get more" button and TSVN will issue a less
restricted query.

A nice-to-have extension would be an "report_merged_revision_once"
parameter that will return any merged revision sub-tree at most once.
Without the discover_merged_revisions being set, the result may be
useless, though (depending on how repeated revision are reported).

> Your concern is a valid one and I appreciate the feedback. It should be
> noted that exponential data growth only really happens when 'svn log -g'
> is run on the root of a repository. If run on '/trunk' or
> '/branches/1.x', you can expect that revisions will only be shown once,
> because they will have only been committed to the branch once.

According to my understanding of the merge-tracking specs, this
assumption is not valid. If I am correct, the theoretical limit is
O(2^revCount). A more typical usage will result in O(n log n).
Please tell me if I miss an important point, because without formal
prove, all those complexity arguments are rather "guestimates".

Consider the following example. At least in our company, we use
a small number of feature branches that will be synchronized with
the /trunk from time to time. Note, that those syn-merges may
require additional changes - adpations to the changes on the
respective branch that ultimately will be merged back into /trunk.
Note: this assumption may not be true - maybe this merge point
will not show up in the log for /trunk?

As a result, the log for the /trunk will not only show the sum
of all changes on all (closed & merged) branches but also its
own history distributed over the branches' sync-points *once for
every branch*. If we assume that the revision count is a measure
for the project's complexity and the number of active branches
is roughly log (complexity), this yields O (n log n) revisions
being reported for svn log -g /trunk.

But things get even worse. Feature branches have differnt life
time, i.e. they will be created and closed a necessary. That
means after a branch gets merged back into /trunk, every open
branch will merge the closed branches history into this own.
Including the /trunk changes, for a second time. It is easy to
see how this roughly doubles the size of the merge tree for
every "generation" of branches.

The reason for this growth seems to be the "compression" of
a whole sequence of multile merge trees into a single node
of another. So, every branch / merge "generation" adds another
level to the merge tree.

In summy, the merged revision trees may (and will) indeed grow
slowly but still exponentially. This does *not* render merge
tracking useless but is an indication how to handle that data.

> Just like checking out the root of a repository results in 'exponential'
> data growth, running 'svn log -g' on the root of the repository will
> result in the same. The solution right now is "Don't Do That".

On the surface, the effect is the same. However, c/o will
only grow linearly to the number of branches (and tags).
While merge history will not grow with the number of tags,
it will grow exponentially with the number of branches or
branch generations.

But I am somewhat optimistic that certain policies may effectivly
limit this growth. Detailed analysis may show a way to do this.

-- Stefan^2.
Received on Tue Jun 5 13:27:48 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.