If you've been following
http://svn.haxx.se/dev/archive-2005-02/0897.shtml ("ugly problem found
while trying to test KDE SVN"), then you've seen me conclude that our
auto-merging code can be ruinously inefficient on big repositories.
Although Eric has fixed the auto-merge memory leaks in FSFS (his
change will need to be back-ported to BDB), doing multiple complete
predecessor walks in a repository like kde-i18n is prohibitively slow.
I'm not prepared to rewrite the auto-merging code. But I can present
a design which I think is simpler, more maintainable, and doesn't
involve asking whether any node is an ancestor of any other node.
Here it is, then. We want to merge three directories: ANCESTOR (which
is the revision our transaction is based off of), SOURCE (which is the
current head revision, containing changes made to the repository while
the transaction was in progress), and TARGET (which is the transaction
to be committed).
For each entry NAME in the directory ANCESTOR:
Let ANCESTOR-ENTRY, SOURCE-ENTRY, and TARGET-ENTRY be the IDs of
the name within ANCESTOR, SOURCE, and TARGET respectively.
(Possibly null if NAME does not exist in SOURCE or TARGET.)
If ANCESTOR-ENTRY == SOURCE-ENTRY, then:
No changes were made to this entry while the transaction was in
progress, so do nothing to the target.
Else if ANCESTOR-ENTRY == TARGET-ENTRY, then:
A change was made to this entry while the transaction was in
process, but the transaction did not touch this entry. Replace
TARGET-ENTRY with SOURCE-ENTRY.
Else:
Changes were made to this entry both within the transaction and
to the repository while the transaction was in progress. They
must be merged or declared to be in conflict.
If SOURCE-ID and TARGET-ID are both null, do nothing. (This is
a double delete. We could flag a conflict, but the current
auto-merge allows this case.) If either SOURCE-ID or TARGET-ID
is null but not the other one, flag a conflict. A deletion is
incompatible with a modification.
If any of the three entries is of type file, declare a conflict.
If either SOURCE-ENTRY or TARGET-ENTRY is not a direct
modification of ANCESTOR-ENTRY (determine by comparing the
node-id and copy-id fields), declare a conflict. A replacement
is incompatible with a modification or other replacement--even
an identical replacement.
Direct modifications were made to the directory ANCESTOR-ENTRY
in both SOURCE and TARGET. Recursively merge these
modifications.
For each leftover entry NAME in the directory SOURCE:
If NAME exists in TARGET, declare a conflict. Even if SOURCE and
TARGET are adding exactly the same thing, two additions are not
auto-mergeable with each other. (This would be a change from the
current auto-merge code, which allows adds of identical
node-revs.)
Add NAME to TARGET with the entry from SOURCE.
Now that we are done merging the changes from SOURCE into the
directory TARGET, update TARGET's predecessor to be SOURCE.
If you can follow the above algorithm, it most likely makes sense, but
you're probably wondering why the existing auto-merge code asks
questions about ancestry. Here is what I could determine:
* If changes were made to an entry in both SOURCE and TARGET, the
current auto-merge code checks if TARGET is a descendent of
SOURCE, and does nothing in that case. But that's essentially
impossible; it could only happen if SOURCE replaces the entry with
blah and TARGET replaces the entry with some descendent of blah,
and that really ought to be a conflict.
* The current auto-merge code only updates the predecessor of a
merged directory if there is no existing ordering between those
two entries. Since my algorithm only attempts to merge
directories if SOURCE and TARGET are direct modifications of
ANCESTOR, there is never an existing ordering between SOURCE and
TARGET. So there is no need to check.
Another thing I've omitted from my algorithm is described in issue
#418 (http://subversion.tigris.org/issues/show_bug.cgi?id=418). For
reasons I do not understand, the current auto-merge code allows a
deletion in SOURCE to be merged with a replacement in TARGET (by
ignoring the deletion). Supposedly, this is for ease of use, but
neither the code nor issue #418 describes the use case which is being
made easier. Perhaps this dates back to when commits were made
against the working copy base revision, not against HEAD? If we do
need the allowance described in issue #418, it doesn't require any
ancestry checks, just a relatedness check.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Feb 27 12:48:33 2005