Julian Foad wrote on Thu, Apr 30, 2015 at 10:30:39 +0100:
> Daniel Shahaf wrote (quoting two emails combined):
> > Okay. So what you're saying so far is that the data model will have
> > distinct concepts for "copying" and "branching".
> > Presumably [...] some high-level operations
> > will behave differently if the object operated upon is a branch compared
> > to if it is a plain copy. The interesting question is what those
> > differences will be.
> Usually, in practice, many elements are branched together, and this is
> reflected in the model. Unlike copying, a branch contains a *set* of
> elements. New elements can be added to the set later: this happens,
> for example, when we merge the changes into this branch from a source
> branch in which new elements have been created. Elements removed from
> the branch can be re-added later as the same element ("resurrection").
Okay, so an element is a node (in the svn_node_kind_t sense), and
a branch operation forks a set of elements. Is that set necessarily
a subtree rooted by some node, or could I, say, create a branch that
contains only the elements subversion/*/main.c and no others?¹
You mentioned in an earlier mail that a branch is conceptually "mounted"
into the fspath space. Is it possible to mount a branch at more than
one point? (If yes, we'd immediately have in-repository hard links.)
¹ That particular glob pattern doesn't match since r1415273, but it
illustrates my point well.
> This is a significant difference from the old model of 'copying' where
> each element (each file and directory) gets an independent new id .
> And it matters particularly when you have the ability to move elements
> around in the tree -- you need a way to track which elements are
> members of which branch.
That's probably not what you meant, but what would 'svn mv ^/foo/iota
^/bar/kappa' do, when /foo and /bar are related branch roots? And when
/foo is a branch root and /bar is a path-wise-child of ^/ created by
I suspect those operations are undefined in the "branches data model"
layer, but defined in the lower-level FS API layer. But maybe there is
a way to make at least the first well-defined?
>  In the existing Subversion back-end, on copying a directory, each
> file and directory in it gets a new copy-id once lazy copying is
> undone. (For modelling purposes, lazy-copying hardly matters. As a
> thought experiment, imagine on top of the old Subversion back-end we
> build a version control system that, as part of its design, always
> makes a small modification to each file and directory when copying.
> Thus we never see 'lazy copying' and always see a new id for each
Sure, lazy copying in particular and copy-ids in general are
implementation details of the FS backends. More generally, we can
identify four API layers: the DAG layer, the FS layer (svn_fs.h), the
new data model for branches/moves, and higher-level functionality on top
of the latter.
> These new ids are independent for each file/dir: there is
> no way to group them into sets.
In the last sentence, "new ids" refers to the existing FS layers' copy
ids in the example, not to the branch/element ids in the mv-t-2 branch,
Let me see if I understand. IIUC, in the new model, elements would know
what branch they belong to, and would have unique ids that persist
through renames within the branch and delete/resurrect cycles. Element
ids would be unique throughout the entire repository, right?
As an alternative, element ids could be made unique only throughout
their branch and all branches that are copy-wise-descendants of the
copy-wise-primogenitor of the branch that contains them — so, for
example, /subversion/(trunk|branches/*) are one "element id space", but
but /httpd/httpd/(trunk|branches/*) would be (together) another "element
id space". After all, merging from httpd/trunk to subversion/trunk
isn't defined; letting them have distinct element-id spaces would
model that undefinedness. (I'm not sure how this compares with
"branch families", which had been eradicated before I took a close look
at the branch. Let me know if I just rehashed something that's been
> Another significant property of a branch in this model is the
> one-to-one correspondence between the (instances of) elements in this
> branch and those in another branch, for the elements that appear in
> both of the branches.
I'm not sure I understand. Are you saying that element ids allow us to
easily answer the question "What has element X on this branch been
renamed to on that branch"? If so, then yes, it does, but how does it
handles bifurcations (such as r1024269) and variants (svn sw ^/branch; svn
cp blue.c navy.c; svn cp blue.c cyan.c; svn rm blue.c) and merges
(suppose we decide to reverse-merge r1024269 into trunk)?
Basically, "what has X been renamed to" is a one-to-(either 0 or 1)
relation, but with bifurcations/variants/merges, N-to-M relations might
For the "reverse-merge r1024269" example (creating one element that's
logically a combination of several other elements), perhaps we could
have metadata on the resulting (combined) element that lists _all_
elements that are logically its immediate parents. (That's similar to
how git octopus merges work: they create a commit object that has an
arbitrary number of other commit objects listed as its parents. (Git
models history as a DAG of filetrees, comparable to svn's array of
Tangentially related, should /@0 be a branch root? It seems like asking
for trouble to allow one branch to be both a copy-wise-ancestor and
path-wise-ancestor of another, and I don't see what use-case it serves
other than users who accidentally created their contents in ^/ rather
than in ^/trunk. I assume making /@0 not-a-branch might mean nasty
special-cases throughout the code, though… so perhaps make ^/ a branch
that cannot itself be branched? i.e., have 'svnmover branch "" foobar'
always fail regardless of the value of 'foobar'?
> > Should "branching" be an atomic concept of the model, or a derived one?
> > That is, we could define "branch" as "a node created from an existing
> > node via the 'svnmover branch' API", or we could define branch in terms
> > of lower-level operations?
> A 'branch' is an atomic (not derived) concept of the model. And, as
> said above, a branch is not a node, it's a container containing
> instances of a set of elements.
> > By the way, if the data model has a first-class 'branch' operation, will
> > it also have a first-class 'tag' operation?
> It will be possible to use a 'branch' as a 'tag' in the same limited
> sense as in the old model. I have not looked at introducing a
> different kind of tagging.
Okay. The immediate idea would be to define new-model tags as immutable
new-model branches, but I haven't thought hard about this.
> > And where would the
> > distinction between 'feature branches', 'stabilization branches', etc.,
> > live?
> That would live in a higher layer. The model I'm presenting attempts
> only to provide sufficient foundation for move tracking in merging.
> The branches in the model have no classes of relationship between
> them: they are all the same. I envisage that kind of functionality
> being built in a layer on top of this model.
Great. Mapping these layers to our existing library structure is going
to be interesting, I think ☺
> Thanks again for your thoughtful questions. How much is making sense?
I think I finally have a good understanding of what model you're
proposing and what high-level features it aims to enable. Thanks for
all the explanations!
> - Julian
P.S. Following up on the ls-br-r output discussion, I really like the
new ls-br-r output. It is very readable, in my opinion. The
"datum=value" syntax is great for having both human and machine
parseablity, while still being dense. (I should have thought of it;
I've used it too.)
Received on 2015-05-01 04:29:59 CEST