Re: Subversion in 2010

From: Stefan Sperling <stsp_at_elego.de>
Date: Tue, 5 Jan 2010 10:29:21 +0000

On Tue, Jan 05, 2010 at 01:02:07AM -0500, Mark Mielke wrote:
> On 01/04/2010 02:32 PM, Stefan Sperling wrote:
> >On Mon, Jan 04, 2010 at 01:45:07PM -0500, Mark Mielke wrote:
> >>
> >>If it doesn't resolve them (any? all?) yet, then this would explain
> >>one of the results I saw and couldn't explain. It knew the files had
> >>moved, it said it completed the merge - but the merge was missing. I
> >>became too busy to chase it down! :-(
> >Out of curiosity, where did you get the idea from that Subversion
> >could resolve tree conflicts for you?
> >Is there documentation which is not clear enough and needs to be fixed?
>
> The release notes and the documentation. Reading it closer - I see
> that it clearly states "detection" and not "resolution", however, I
> think a casual reader or an optimistic reader, such as myself, would
> easily come to the conclusion that "detection" implied resolution.
> That is, why detect if the knowledge will not be used to make the
> right decision?

Because contrary to what you see on the surface the knowledge is not
really there.

> Reading it again, I am still failing to see why detection would not
> imply resolution. If the code can figure out what happened - why can
> it not act on this knowledge? If we know that A moved to B at the
> same time as it changed from revision X to revision Y, why not do
> the three way diff between A:X, B:Y, and the source?

The fundamental problem is that while you know that something has
moved and where to, Subversion does not. It just knows about deletions,
and "adds with history", i.e. files which were added with a back-pointer
saying "I came from this/other/path_at_REV".

That's all data you can work with, that's the data model as far as
renames go (now compare this to git or hg and you'll start to see
a bit of a difference).

A program cannot reliably and efficiently match up the deletes with
adds-with-history. A human can do it when browsing the log,
but it's hard to make a program do it reliably and efficiently.

In fact, it turns out that by looking just at deletion events a program
gets all information it will ever be able to efficiently retrieve in
the current data model.

For example, you get an incoming delete during an update. You see
that the same path is already locally deleted. This can mean:
1) Both sides deleted the path -> no conflict
2) The path was deleted in the repository, and moved locally -> conflict
3) The path was moved in the repository, and deleted locally -> conflict
4) The path was moved in the repository and moved locally -> conflict if
the target location of the move operations differ

Subversion currently has to put all these cases into the same basket.
It flags them all as a tree conflict. 1) is a false positive and 4) might
be a false positive.

There is a tool called "truMerge" that tries to do better, and you might
want to give it a very close look since you do a lot of refactoring:
http://trumerge.open.collab.net/
It essentially tries to automate the log parsing tasks a human can do.
It is quite slow but can merge renames correctly. It is used in production
by at least one enterprise, but is still more of a proof-of-concept
implementation that shows how Subversion could handle renames within
the current data model. But AFAIK it only performs kind of well if
the log is already stored in a file on local disk.

> What is required to take detection to the next step?

Subversion needs to amend its data model to provide copy-to information,
to complement the current copy-from information. Then, renames could be
handled efficiently, and simple tree conflicts involving renames could
be detected reliably (if we also enforce that the delete and add half
of a rename are always committed in the same revision, which wc-ng will
allow us to do).

This is easier said than done. It implies repository format changes.
We'd need to a way to modify old revisions to store this information
because the copy-to data needs to sit at the copy *source*.
This isn't currently possible in FSFS. Using a separate database for
this copy-to data would work, too, but that's a gross hack and might
not perform well (but users might not care too much about the performance
aspect since they're already used to Subversion being slow :)

Note that wc-ng will start to store copy-to information in the working
copy, so we can at least use it for local renames. For renames coming
in from the server during updates or merges we need copy-to information
stored in the repository.

Note that lack of copy-to information also means that guessing renames isn't
an option. You'd need to compare a given deleted file with all files in
all later revisions of the entire repository. In git, this operation is
easy since all data is stored on local disk and they mostly walk a tree
of SHA1 sums. But in Subversion you just could not get it to perform well.
And once you have the copy-to info you need to make this perform well,
you don't need to guess renames anymore.

Stefan
Received on 2010-01-05 11:30:30 CET

This message: [ Message body ]
Next message: Julian Foad: "Re: Shut down users@s.t.o on Jan. 7."
Previous message: Bert Huijben: "RE: svn commit: r895794 - in /subversion/trunk/subversion: libsvn_wc/props.c tests/cmdline/prop_tests.py"
In reply to: Mark Mielke: "Re: Subversion in 2010"
Next in thread: Peter Samuelson: "Re: Subversion in 2010"
Reply: Peter Samuelson: "Re: Subversion in 2010"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]