[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: RE: Re: Maintaining NodeID sanity

From: Bill Tutt <rassilon_at_lyra.org>
Date: 2002-05-08 09:37:27 CEST

Oh blech, I hate DAGs and lazy actions, it makes life much more
complicated.
Snippets below about how things might change if I actually think about
each case related to the lazy branching scheme.

> From: Bill Tutt [mailto:rassilon@lyra.org]
> > From: Karl Fogel [mailto:kfogel@newton.ch.collab.net]
> > > From: Bill Tutt [mailto:rassilon@lyra.org]
> > > For BDB you could always create a separate NodeChangeAnecstorBlob
> table
> > > if it's deemed that it makes more perf sense that way.
> >
> > This is the assertion I need to be comforted about... :-)
> >
> > Blobs are okay unless you're searching in them often for
information.
> > If we have to constantly load and/or scan a whole blob to find out
> > whether node FOO is an ancestor of BAR or not, that seems like a bit
> > of downgrade from the current system.
> >
> > Am I worrying about nothing? Look in the filesystem at where we do
> > ancestry and predecessor checks... I just don't see how it can
> > possibly be efficient enough to use this ancestry mechanism :-(.
> >
> Most of these answers rely on BranchID being implemented since they
> restore NodeID to sanity and lets me make the following comments:
>
> Let's define a handy function in (NodeID, BranchID, ChangeSetID) land
> because we'll be seeing it a lot:
> AreNodeChangesTimelesslyRelated(Source, Target)
> {
> return (Source.NodeID == Target.NodeID &&
> Source.BranchID == Target.BranchID);
> }
>
I'm currently guessing that by the time this function is called all
mutability rules for lazy BranchID application will have kicked in. See
the discussion about directory auto-merging for specifics.

> Ok, let's check our ancestry examination functions out:
> svn_fs__id_is_parent: only svn_fs_txn_path_is_id uses this function.
> Returns true if A is a parent of B. i.e. if A == B then false.
>
> svn_fs__id_is_parent is only called by:
>
> svn_fs_txn_path_is_id: mod_dav_svn uses this function. I don't know
why
> mod_dav_svn uses this function. I don't quite understand the WebDAV
> context this is occurring in, so I can't analyze this one very well.
I'm
> guessing WebDAV is trying to figure out if a directory is the correct
> version or not, and it only wants to succeed if it cheaply appears to
be
> auto-mergable. So I'd guess a simple check with
> AreNodeChangesTimelesslyRelated should do the trick.
>
Don't have any idea what Branches mean here.

> svn_fs__id_is_ancestor: The only caller of this is directory merging.
> Returns true if A == B or A is parent of B
>
> Regarding the directory merging call site:
[...snip...]
> Case 1: Used to verify that Source is ordered after Target which is
> ordered after Ancestor in a (NodeID, BranchID, ChangeSetID) sense
after
> the schema change.
>
> During a commit, this is guaranteed by our caller, relaxing to
> AreNodeChangesTimelesslyRelated is ok here.
>

If you have the example:

/foo/main/A
/foo/alt/A

and alt is a branch of main, but A hasn't been modified under alt yet.

/foo/alt/A is currently (</foo/main/A NodeID>, NULL, ChangeSet0)
User 1 commits a change to /foo/alt/A.
The process of acquiring a mutable NodeChange yields a Source of:
 (</foo/main/A NodeID>, </foo/alt BranchID>, ChangeSet1)
This is compared against a Target NodeChange of:
 (</foo/main/A NodeID>, NULL, ChangeSet0)

In this case Ancestor == Target and so it's just allowed to succeed.

Since BranchID changes during the "acquire a mutable node phase", I
don't think Case 1 will ever get hit for a weird non Source.BranchID !=
Target.BranchID when the NodeID's are identical. Since this would imply
some bizarre operation like: "branch /foo/alt /foo/alt" which is illegal
in any operation that doesn't also involve either Target or Source not
existing.

> Case 2: Used to detect unrelated-ness between Source and Transaction
> We know that S != T because we just got done checking that.
>
> During a commit, S can't be a successor of T because T hasn't been
> committed yet, so we could optimize the ancestor check out.
>

This isn't really affected by Branching.

> svn_fs_id_distance is called by the following functions:
>
> svn_fs_check_related: Uses it to prove that they're related via
ancestry
> somehow. (including via copy history)
>
> The new schema offers check_related some more attempts for
optimization
> before it has to call id_distance.
>
> However, check_related isn't called atm!
>

Node ancestry isn't truncated when performing a branch, so check_related
keeps on working for Branches. So, you still have to do the hard work.
Whee!

> tree.c:merge(): Directory merging
> Source == proposed HEAD NodeChange for directory
> Target == current HEAD NodeChange for directory
> Ancestor == common parent NodeChange for directory
> Case 1: Used to re-id a directory node if necessary due to trying to
> auto-resolve directory versioning, and trying to be nice about
> generating complete auto-merge history for directories.
>
> I don't think this makes any sense any more given the new NodeChange
PK.
> A successor is simply a new (NodeID, BranchID, ChangeSetID) tuple
that's
> just later in time. It's not even like the auto-merging delta details
> operation are recorded separately, the original version of the
> NodeChange is deleted.
> This made more sense with the old NodeRevisionIDs.
>

Ditto. No change here for Branches.

> Case 2: Target recreated something that was deleted in Source.
> This is essentially an out-of-dateness check. If you're going to
delete
> something in Source that was updated in Target, then you should have a
> more up to date directory then the one you tried to commit with.
Update
> and retry your svn commit.
>
> Case 2 is an annoying race condition, but it's very important to
double
> check it. Fortunately, this code just needs to know if
> AreNodeChangesTimelesslyRelated. (i.e. they differ only in
ChangeSetID)
>
> The reason for this is that time order between Target and Source never
> applies for this case. Source doesn't exist to compare against, and
> Ancestor is always <= Target by definition.
>

This doesn't change at all. Deletion doesn't alter this case at all.
Branches don't do weird things with Deletions.

> Libsvn_repos/delta.c:
> repos_dir_delta():
> Only called from mod_dav_svn, svnlook, libsvn_repos/log.c, and
> libsvn_repos/reporter.c
> Used to détermine the appropriate delta operation to perform/report
> based on the returned id distance.
>
> This can use our new found friend AreNodeChangesTimelesslyRelated
> because the time order isn't important here.
>

This is actually only used from a logging data scavenging point of view,
so using fs_distance_id would work. Esp. since the source/target dirs
aren't likely to be next to each other.

> Find_nearest_entry:
> /* Find the distance between the target entry and this source
> entry. This returns -1 if they're completely unrelated.
> Here we're using ID distance as an approximation for delta
> size. */
>
> No escape here, we have to do the work atm, unless we want to use a
> different heuristic.
>

Nothing different here.

> delta_dirs:
> /* Check the distance between the ids.
>
> 0 means they are the same id, and this is a noop.
>
> -1 means they are unrelated, so try to find an
ancestor
> elsewhere in the directory. Theoretically, using an
> ancestor as a baseline will reduce the size of the
> deltas.
>
> Any other positive value means the nodes are related
> through ancestry, so go ahead and do the replace
> directly. */
> We can use AreNodeChangesTimelesslyRelated here as well. If
> AreNodeChangesTimelesslyRelated returns FALSE, then we'll call
> find_nearest_entry which will do the really expensive work for us. No
> need to call the expensive check twice for one chunk of data.
>

Well, scratch that, we need to be able to crawl backwards over BranchID
changes boundaries in the direct line of change. So fs_distance_id would
work here too, and our new helper wouldn't.

> The important conclusion:
> The directory auto-merging process is now ancestor checking free for a
> commit. However, delta editor work still can require arbitrary
ancestry
> hopping, but only some of the time (when find_nearest_entry gets
> called). Additionally, anybody who ever calls svn_fs_check_related is
> just going to have to wait, but that was true before the schema
change.
>

Good news is that the summary is still the same.

> Phew... talk about a lot of work,
> Bill
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed May 8 09:38:28 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.