Auto-merging without ancestry checks

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2005-02-27 12:47:19 CET

If you've been following
http://svn.haxx.se/dev/archive-2005-02/0897.shtml ("ugly problem found
while trying to test KDE SVN"), then you've seen me conclude that our
auto-merging code can be ruinously inefficient on big repositories.
Although Eric has fixed the auto-merge memory leaks in FSFS (his
change will need to be back-ported to BDB), doing multiple complete
predecessor walks in a repository like kde-i18n is prohibitively slow.

I'm not prepared to rewrite the auto-merging code. But I can present
a design which I think is simpler, more maintainable, and doesn't
involve asking whether any node is an ancestor of any other node.

Here it is, then. We want to merge three directories: ANCESTOR (which
is the revision our transaction is based off of), SOURCE (which is the
current head revision, containing changes made to the repository while
the transaction was in progress), and TARGET (which is the transaction
to be committed).

For each entry NAME in the directory ANCESTOR:

    Let ANCESTOR-ENTRY, SOURCE-ENTRY, and TARGET-ENTRY be the IDs of
    the name within ANCESTOR, SOURCE, and TARGET respectively.
    (Possibly null if NAME does not exist in SOURCE or TARGET.)

    If ANCESTOR-ENTRY == SOURCE-ENTRY, then:
      No changes were made to this entry while the transaction was in
      progress, so do nothing to the target.

    Else if ANCESTOR-ENTRY == TARGET-ENTRY, then:
      A change was made to this entry while the transaction was in
      process, but the transaction did not touch this entry. Replace
      TARGET-ENTRY with SOURCE-ENTRY.

    Else:
      Changes were made to this entry both within the transaction and
      to the repository while the transaction was in progress. They
      must be merged or declared to be in conflict.

      If SOURCE-ID and TARGET-ID are both null, do nothing. (This is
      a double delete. We could flag a conflict, but the current
      auto-merge allows this case.) If either SOURCE-ID or TARGET-ID
      is null but not the other one, flag a conflict. A deletion is
      incompatible with a modification.

If any of the three entries is of type file, declare a conflict.

      If either SOURCE-ENTRY or TARGET-ENTRY is not a direct
      modification of ANCESTOR-ENTRY (determine by comparing the
      node-id and copy-id fields), declare a conflict. A replacement
      is incompatible with a modification or other replacement--even
      an identical replacement.

      Direct modifications were made to the directory ANCESTOR-ENTRY
      in both SOURCE and TARGET. Recursively merge these
      modifications.

For each leftover entry NAME in the directory SOURCE:

    If NAME exists in TARGET, declare a conflict. Even if SOURCE and
    TARGET are adding exactly the same thing, two additions are not
    auto-mergeable with each other. (This would be a change from the
    current auto-merge code, which allows adds of identical
    node-revs.)

Add NAME to TARGET with the entry from SOURCE.

Now that we are done merging the changes from SOURCE into the
directory TARGET, update TARGET's predecessor to be SOURCE.

If you can follow the above algorithm, it most likely makes sense, but
you're probably wondering why the existing auto-merge code asks
questions about ancestry. Here is what I could determine:

  * If changes were made to an entry in both SOURCE and TARGET, the
    current auto-merge code checks if TARGET is a descendent of
    SOURCE, and does nothing in that case. But that's essentially
    impossible; it could only happen if SOURCE replaces the entry with
    blah and TARGET replaces the entry with some descendent of blah,
    and that really ought to be a conflict.

  * The current auto-merge code only updates the predecessor of a
    merged directory if there is no existing ordering between those
    two entries. Since my algorithm only attempts to merge
    directories if SOURCE and TARGET are direct modifications of
    ANCESTOR, there is never an existing ordering between SOURCE and
    TARGET. So there is no need to check.

Another thing I've omitted from my algorithm is described in issue
#418 (http://subversion.tigris.org/issues/show_bug.cgi?id=418). For
reasons I do not understand, the current auto-merge code allows a
deletion in SOURCE to be merged with a replacement in TARGET (by
ignoring the deletion). Supposedly, this is for ease of use, but
neither the code nor issue #418 describes the use case which is being
made easier. Perhaps this dates back to when commits were made
against the working copy base revision, not against HEAD? If we do
need the allowance described in issue #418, it doesn't require any
ancestry checks, just a relatedness check.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Feb 27 12:48:33 2005

This message: [ Message body ]
Next message: Peter N. Lundblad: "Re: new functions in 1.2"
Previous message: C. Michael Pilato: "Re: new functions in 1.2"
Next in thread: Ben Collins-Sussman: "Re: Auto-merging without ancestry checks"
Reply: Ben Collins-Sussman: "Re: Auto-merging without ancestry checks"
Reply: Eric Gillespie: "Re: Auto-merging without ancestry checks"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]