Storing Copied-To info (was: Tree conflicts - thoughts on use cases, merging, and tests)

From: Talden <talden_at_gmail.com>
Date: Sat, 22 Mar 2008 01:37:12 +1300

> (2) there is no API call "whereis", defined as follows:
> whereis(URL:rev, targetbranchURL:rev), tells you where a file
> identified by URL:rev (on a source branch, for example), can be found
> in the targetbranch (note: whereis may return [0:N] URLs because of
> possible cloning on the target branch). (Note that this requires
> the ability to search "forward" through the logs efficiently, a
> feature that Subversion does not provide right now AFAIK.)

I'd been thinking about this Copied-To information recently and how it
might be stored in an append-only system in a manner that is cheap to
store, but saves some of the cost of a complete forward scan of all
revisions.

If we think of paths in the repository as currently all being in the
'File-System' name-space then a path from the repository root of
"ant.txt" is actually "FS:/ant.txt"

We could use the same skip-delta logic to build up content in another
name-space, 'Copied-To', with content consumed by clients during log
operations.

EG let's say we performed the copy...

FS:/ant.txt_at_r1 --> FS:/bat.txt_at_r3

...then "CT:/ant.txt/r1" could have the following added to its content...

bat.txt_at_r3

...if we then did...

FS:/ant.txt_at_r1 --> FS:/cat.txt_at_r4
FS:/bat.txt_at_r3 --> FS:/dog.txt_at_r4

...we would add change "CT:/ant.txt/r1" to...

bat.txt_at_r3
cat.txt_at_r4

...and add the following to "CT:/bat.txt/r3"

dog.txt_at_r4

Looking up Copied-to information is now available by looking at the
HEAD revision of the relevant file in the Copied-To name-space. It
should also be cheap to ask "what are all of the revisions at which
this file was copied, when and where was it copied to".

Now naturally the file need not be textual, should probably be sorted
and should probably use internal identifiers rather than textual
filenames as source and destination.

Also merge-logic for this name-space is different and, I think, as I
think conflicts are always resolvable with a 'keep both' approach.

Note that this probably doesn't make looking for copied-to information
cheap, but I think that for many use-cases it will make it cheaper.
Benefiting from binary diffs and skip-deltas this shouldn't be a huge
additional burden. You add the delta in the same revision that the
copy is performed making no change to the atomic nature of operations
or the append-only nature necessary for syncing mirrors.

I'm curious whether anyone else sees this as a solution with any merit
within the scope of the existing SVN architecture. Is this at least
thought provoking enough to hear some discussion on how we might make
copied-to information cheaper? Would this be useful in building
revision graphs that trace tagging and branching as well as
modification?

--
Talden
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org

Received on 2008-03-21 13:37:24 CET

This message: [ Message body ]
Next message: Stefan Sperling: "Re: Tree conflicts - thoughts on use cases, merging, and tests"
Previous message: Ivan Zhakov: "Re: Including serf 0.2 in svn 1.5 RC's"
Next in thread: Stefan Sperling: "Re: Storing Copied-To info (was: Tree conflicts - thoughts on use cases, merging, and tests)"
Reply: Stefan Sperling: "Re: Storing Copied-To info (was: Tree conflicts - thoughts on use cases, merging, and tests)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]