Re: Greg Stein's several issues: proposed resolutions

From: Greg Stein <gstein_at_lyra.org>
Date: 2000-12-22 12:54:19 CET

On Thu, Dec 21, 2000 at 03:17:04PM -0600, Karl Fogel wrote:
> Greg, a few different threads discussed in one email here; I know
> that's not ideal mailing list technique, but in this case it may help
> keep things organized.

Actually, it's fine if it is a "closing summary." But yah... if a bunch of
discussion ensues, then it can be problematic :-)

>...
> 1. Should property names be URIs?
>
> (Oh, I see you just sent a mail saying "I relent, for now.". Heh.
> Anyway, I'll describe this issue briefly so we all agree what we're
> talking about.) The two sub-questions are:
>
> a) Subversion-specific prop names need a unique prefix. Should
> that prefix be "svn:" or some longer URI? We lobby for plain
> old "svn:" -- it's easier to work with, and the namespace
> protection is frankly about the same.

Well, I'm hoping we can get the IANA to register that as an official URI
scheme. If they do, then it *will* be a URI and it will be unique. We
wouldn't need anything longer. If/until that point, a longer name would
(strictly) be safer.

> b) Should all other prop names always be URIs? When the user
> doesn't specify a URI, should Subversion force it into a URI?
> We say no, let the namespace sort itself out. This has worked
> fine with many other systems in the past, and our experience
> (which may or may not be the same as yours) suggests that URI
> schemes don't really improve matters, they just make everyone
> work with longer strings.
>
> Were those the two questions you understood to exist, too, and
> regarding which you relented ("for now")? :-)

Yup and yup.

> 2. Detecting revision-shift during an update.
>
> Suppose you're running an update. The client first reports local
> state, and then starts getting an appropriately-tuned tree delta
> back from the server. But immediately after reporting local state,
> a smaller, quicker update happens to some file that's also involved
> in the larger update. Now the larger incoming tree delta will be
> wrong.
>
> Question: Should we rejigger the editor interface to detect this
> case?
>
> Answer: No, it can be handled entirely with client side bookkeeping
> or locking, and should be thus handled. Adding new arguments would
> result in only minimally more convenience in this one case, and
> the arguments would be ignored baggage in most other cases.

Bookkeeping: you could end up storing a lot of state on the client; this
doesn't hold well with our intended "streamy" model. We'd have to get sneaky
to keep our memory footprint down when updating large repositories (ask Jim
to consider an "svn update" for the GNU tools repository)

Locking: you'd need to lock every dir involved in the update; the more dirs
you lock, the more that will need to be forcefully unlocked if a crash or
"kill" happens. The locks may prevent other operations from happening, such
as a "diff" or a "status" while the big-ass update is occurring.

> (I'm not sure you were firmly advocating rejiggering the editor for
> this, actually.

Nope. But it seemed the best way to avoid keeping a lot of state or locking
the tree.

> Anyway, I just wanted to reassure you that the
> client can detect this independently of the delta coming back from
> the server, either by noticing whether the file has changed since
> state was reported, or by not permitting it to change after state
> gets reported until the update is done.)

Agreed, it's possible. I'm not sure it is the best approach, however.

> 3. The problem of copying or moving a file on top of another file.
> How do you express the ancestry of both the thing being replaced,
> and the thing that replaced it?
>
> Ben said "You first do delete(), then add_file() or
> add_directory()." Fine, but there's a corollary to this: the
> directory in which all this is happening must be up-to-date,
> because changes to file identity (as opposed to contents) are
> changes to its parent directory. So the delete() will return an
> out-of-date error if the dir's revision reveals out-of-dateness,
> which is what we want.

Makes sense.

However, this also implies that to delete "foo.c", I need to update the
directory, which might have an adverse effect on what I was working on. I'd
like to be able to delete foo.c in isolation.

>...
> 4. Issue of how repository paths are stored in the working copy
> metadata.
>
> You're right -- the `repository' file is a holdover from CVS
> thinking, and is not the appropriate way to do things here.
> Instead, each entry can store this information; and by default
> entries inherit the attribute from their directory (that is, the
> entry whose name is SVN_ENTRIES_THIS_DIR). This will get us
> repository mixing cleanly, except, of course, for writing the code
> to do it. :-)
>
> After all, the originating repository is really just part of the
> ancestry information, which is already stored in the entries! It
> only makes sense for additional ancestry information to hang off
> them, too.

You're separating the repository from the item's name again. You've just
moved it from the "repository" file into an attribute of
SVN_ENTRIES_THIS_DIR. Same concept, new location :-)

Consider:

vs.

The former case requires all files to be in the same repository; the latter
allows files from different repositories. Brane was arguing for this a while
back. I'm not keen on it myself :-), but simply recording a full URL means
you don't have to go and compose a bunch of stuff. The "name" attribute even
becomes a bit redundant.

Also consider what happens when you move/copy a file from a different
directory. In the former case, you might end up with problems trying to
resolve the repository/path information [oh... missed the "inherit" thing;
the copied file could spec a different repository, overriding the parent].
The latter allows for easy reference to wherever a file may have come from.

I'd just say that we specify the URL and don't try to do any inheriting.
This kind of depends on the operations that consume that value, of course.
Will a URL always be okay?

[ and I mildly dislike the name "ancestor" in this context because it seems
to imply the local copy is derived somehow. ]

>...
> 5. ra interface:
>
> You were saying "Why do we need those root_path arguments to the
> vtable functions, when the root_path was already indicated in the
> original URI to open_session()?"
>
> We agree. They're gone, Ben removed them.

Saw that. Cool.

> However, there is a new
> question: does the client side first discover the latest rev num
> and then request it explicitly, or does it simply ask for the
> "latest" and never actually say what rev num that is?
>
> The latter way -- when you request a URI, you're actually
> requesting the latest revision at that URI -- may be too simple.
> The problem is, what about when someone wants to check out a
> revision other than the latest? (This happens all the time, as I'm
> sure you've experienced in using CVS.)

Yah... per my recent email to Geoff, I'm still pondering on the "old version
checkout" problem. I've just now realized it is actually a bit easier than I
had thought.

Consider that the server has multiple trees of resources:

    http://www.lyra.org/repos
    http://www.lyra.org/repos/foo.c
    http://www.lyra.org/repos/subdir
    http://www.lyra.org/repos/subdir/bar.c
    http://www.lyra.org/repos/subdir/baz.c

    http://www.lyra.org/repos/$svn/ver/73
    http://www.lyra.org/repos/$svn/ver/73/foo.c
    http://www.lyra.org/repos/$svn/ver/73/subdir
    http://www.lyra.org/repos/$svn/ver/73/subdir/bar.c
    http://www.lyra.org/repos/$svn/ver/73/subdir/baz.c

    http://www.lyra.org/repos/$svn/ver/74
    http://www.lyra.org/repos/$svn/ver/74/foo.c
    http://www.lyra.org/repos/$svn/ver/74/subdir
    http://www.lyra.org/repos/$svn/ver/74/subdir/bar.c
    http://www.lyra.org/repos/$svn/ver/74/subdir/baz.c

I've been thinking in terms of walking the first set (the "version
controlled resources" or VCRs), asking for pointers to the other sets (the
"version resources"), then fetching data from the other sets.

However, it isn't going to be difficult to walk the version resources
instead. So it becomes a matter of asking the top-level VCR for a pointer to
the top-level version resource, then walk down there.

To select an older revision, a labeled revision, or a revision for a
specific date/time, then it just becomes a mapping to the correct version
resource and walking from there.

[ I don't have this mapping in mind yet, but it shouldn't be difficult; I
think that I may defer it to post m2 (unless it's cake) ]

Key benefit (relevant below) is that once we move over to begin walking the
version resources, then we are fetching a consistent set of resources, even
if somebody does a commit during our update.

> Don't have a definite proposal yet for this one, would like to know
> what you think, but anyway we need to have a way to request
> non-latest revs.

I was about to say just pass a revnum parameter, and have a new symbol named
"SVN_REVNUM_LATEST" (-2) that lets the RA layer figure out the optimal way
to fetch the latest. But... there are more options to consider...

> Here's my initial thought on a solution:
>
> open_session (URI, &TOK);
> get_latest_revision (TOK) ==> some rev number

TOK would be the session_baton, I presume?

> (Cached inside TOK so no extra
> network turnaround is required.
> It doesn't matter that it might
> not be the true latest anymore.
> Heck, that could happen even when
> someone requests the latest
> directly from the server -- in the
> time it takes for the answer to
> come back to the client, a new
> latest might appear anyway! So we
> shouldn't worry too much about
> that.)

You'd just pass the revision back, rather than caching it. You may be using
that revision for who-knows-what. It is better to just pass it in,
especially given the multiple APIs that I detail further below.

Explicitly asking for the latest revision (so it can be returned) is more
expensive than beginning a fetch and discovering the latest at that point.
But that also means we're beginning the checkout-walk and need to tell the
passed-in-editor what revision we found.

> This latest rev number would be used during checkout(), say: the
> client passes it to get_checkout_editor(), and the returned
> editor/baton combination has that revision stored, and is therefore
> able to set up all the working copy data correctly even though the
> editor's driver might not ever pass a revision number to an editor
> callback.

Given the rev number, and a guarantee that the checkout editor will always
drive resources with *that* revision, then this model works.

[ this discussion has made me realize the current checkout driver doesn't
  guarantee consisten revisions during a checkout; you could end up with v7,
  then some v8, and a bit of v9 towards the end. that is badness, regardless
  of this conversation. ]

> This is how checkouts are currently working, the only difference
> being that the revision number is coming from a magic hat, since we
> don't have get_latest_revision() yet.

Yup.

The interface for a checkout really needs to have a number of APIs:

  one of:
    rev = get_latest_revision()
    rev = get_revision_at(date)
    rev = get_revision_named(label)
    rev = 7

  followed by one of:
    do_checkout(rev)
    do_update(rev)

Hmm. An update can also switch to a specific revision, so it would also use
the above APIs.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Received on Sat Oct 21 14:36:18 2006

This message: [ Message body ]
Next message: Karl Fogel: "Re: Ancestry arguments for replace functions (was Re: Greg Stein's...)"
Previous message: Greg Stein: "Re: Questions about Subversion"
In reply to: Karl Fogel: "Greg Stein's several issues: proposed resolutions"
Next in thread: Branko Èibej: "Re: Greg Stein's several issues: proposed resolutions"
Reply: Branko Èibej: "Re: Greg Stein's several issues: proposed resolutions"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]