Re: Greg Stein's several issues: proposed resolutions

From: Karl Fogel <kfogel_at_galois.collab.net>
Date: 2000-12-26 16:42:07 CET

Greg, I've been careful only to cut settled items from the topic list
here. :-)

Greg Stein <gstein@lyra.org> writes:
> > 2. Detecting revision-shift during an update.
> >
> > Suppose you're running an update. The client first reports local
> > state, and then starts getting an appropriately-tuned tree delta
> > back from the server. But immediately after reporting local state,
> > a smaller, quicker update happens to some file that's also involved
> > in the larger update. Now the larger incoming tree delta will be
> > wrong.
> >
> > Question: Should we rejigger the editor interface to detect this
> > case?
> >
> > Answer: No, it can be handled entirely with client side bookkeeping
> > or locking, and should be thus handled. Adding new arguments would
> > result in only minimally more convenience in this one case, and
> > the arguments would be ignored baggage in most other cases.
>
> Bookkeeping: you could end up storing a lot of state on the client; this
> doesn't hold well with our intended "streamy" model. We'd have to get sneaky
> to keep our memory footprint down when updating large repositories (ask Jim
> to consider an "svn update" for the GNU tools repository)
>
> Locking: you'd need to lock every dir involved in the update; the more dirs
> you lock, the more that will need to be forcefully unlocked if a crash or
> "kill" happens. The locks may prevent other operations from happening, such
> as a "diff" or a "status" while the big-ass update is occurring.

The size of the bookkeeping is really not all that bad, just a path
and a revision number for each entity... and only the exceptions at
that! (Same as our reporting mechanism.)

Locks wouldn't prevent a diff or status from working -- those are
read-only operations. (True, one might get a slightly weird answer
from them at one instant, but the dangers are not great and probably
not worth protecting people against.)

Cleaning up locks after a crash is a general problem, and it's always
the same difficulty to solve -- it doesn't matter if the locks are few
or numerous, nor does it matter what operation set them. There's an
"svn cleanup" command (or whatnot) to do it, since sometimes it has to
be a human who decides to do the cleanup, though SVN can provide
some hints about when it should be done, and occasionally even do it
automatically.

> > Anyway, I just wanted to reassure you that the
> > client can detect this independently of the delta coming back from
> > the server, either by noticing whether the file has changed since
> > state was reported, or by not permitting it to change after state
> > gets reported until the update is done.)
>
> Agreed, it's possible. I'm not sure it is the best approach, however.

Oh, I'm not completely sure either. I suspect it's best, though,
because given the choice of adding complexity to an interface or not,
I'll almost always choose not to. The penalty would have to be really
high in the client code to justify adding new args to the editor api,
IMHO.

> However, this also implies that to delete "foo.c", I need to update the
> directory, which might have an adverse effect on what I was working on. I'd
> like to be able to delete foo.c in isolation.

Mmm. That would be nice, but... Just thinking about it theoretically:
you're making a change to the *directory* here, not to foo.c itself,
so why expect to be able to make the change on an out-of-date copy of
the dir? You wouldn't expect to be able to do that with file
contents, right?

> You're separating the repository from the item's name again. You've just
> moved it from the "repository" file into an attribute of
> SVN_ENTRIES_THIS_DIR. Same concept, new location :-)

Sorry, I wasn't clear -- I meant, since this is part of ancestry, it
would probably get folded into the existing `ancestor' attribute.
That is, the ancestor field would become a full URL, as you also
advocated.

I didn't come right and say that, because I wasn't sure it was the
best way, that's all.

> I'd just say that we specify the URL and don't try to do any inheriting.
> This kind of depends on the operations that consume that value, of course.
> Will a URL always be okay?

You win, you win! Uncle! You won before you even sent your mail!

:-)

Actually, there might be cases where having ancestry be a full URL
would be a hindrance. So I'm not 100% sure it's better to store the
repository itself as part of the path ancestry, although I share your
instinct that it would be nice. Two disadvantages, though:

1. Once you're in the repository, passing around the full URL
instead of just the path portion would be cumbersome.

   2. We lose the ability to tell whether two entities are from the
      same repository via a simple strcmp() (and surely this is
      important, for knowing to whom one must open a network
      connection! :-) ). The strcmp() will always return non-zero
      now, even when they come from different repositories.

I guess my main point is: every entry should have the ability to
specify its repository. Maybe we want to do that by making `ancestor'
be a URL, or maybe not, but that's a relatively minor technical
issue, and doesn't need to be resolved this moment.

> [ and I mildly dislike the name "ancestor" in this context because it seems
> to imply the local copy is derived somehow. ]

Oh, I see what you mean. Ancestor implies an earlier revision than
the copy here, when it's really the same revision, just located in the
repository.

Thoughts for a better term? I like this:

:-)

> Key benefit (relevant below) is that once we move over to begin walking the
> version resources, then we are fetching a consistent set of resources, even
> if somebody does a commit during our update.
>
> > Don't have a definite proposal yet for this one, would like to know
> > what you think, but anyway we need to have a way to request
> > non-latest revs.
>
> I was about to say just pass a revnum parameter, and have a new symbol named
> "SVN_REVNUM_LATEST" (-2) that lets the RA layer figure out the optimal way
> to fetch the latest. But... there are more options to consider...

Wouldn't it just be simpler to require clients to always request a
specific revision, and then make sure the client knows the latest
revision number so it can request it?

> > open_session (URI, &TOK);
> > get_latest_revision (TOK) ==> some rev number
>
> TOK would be the session_baton, I presume?

Yeah.

> > (Cached inside TOK so no extra
> > network turnaround is required.
> > It doesn't matter that it might
> > not be the true latest anymore.
> > Heck, that could happen even when
> > someone requests the latest
> > directly from the server -- in the
> > time it takes for the answer to
> > come back to the client, a new
> > latest might appear anyway! So we
> > shouldn't worry too much about
> > that.)
>
> You'd just pass the revision back, rather than caching it. You may be using
> that revision for who-knows-what. It is better to just pass it in,
> especially given the multiple APIs that I detail further below.

Wow. Confused, sorry: who would "just pass the revision back", and to
whom? The only point of the above interface is to save one network
turnaround -- instead of the client asking explicitly for the latest
revision, it just gets it automatically on any open_session() with a
repository. After that, the client *does* explicitly pass it back to
further calls...

> Explicitly asking for the latest revision (so it can be returned) is more
> expensive than beginning a fetch and discovering the latest at that point.
> But that also means we're beginning the checkout-walk and need to tell the
> passed-in-editor what revision we found.

I'm sorry, I'm completely missing something here. Can you use
explicit antecedents? Instead of "we're beginning the checkout-wlak
and need to tell the passed-in-editor what revision we found", tell me
which side is doing what, maybe...

> > This latest rev number would be used during checkout(), say: the
> > client passes it to get_checkout_editor(), and the returned
> > editor/baton combination has that revision stored, and is therefore
> > able to set up all the working copy data correctly even though the
> > editor's driver might not ever pass a revision number to an editor
> > callback.
>
> Given the rev number, and a guarantee that the checkout editor will always
> drive resources with *that* revision, then this model works.

Right, that's what I was aiming for too.

> > This is how checkouts are currently working, the only difference
> > being that the revision number is coming from a magic hat, since we
> > don't have get_latest_revision() yet.
>
> Yup.
>
> The interface for a checkout really needs to have a number of APIs:
>
> one of:
> rev = get_latest_revision()
> rev = get_revision_at(date)
> rev = get_revision_named(label)
> rev = 7
>
> followed by one of:
> do_checkout(rev)
> do_update(rev)
>
> Hmm. An update can also switch to a specific revision, so it would also use
> the above APIs.

Yup. But typical checkouts/updates (heck, they're the same thing)
usually want the latest rev, so returning that as part of the session
baton from open_session() and omitting the get_latest_revision()
function, seems useful. The other get_revision_*() functions are
still necessary, of course.

I think we're on the same page, my confusion above notwithstanding.

-K
Received on Sat Oct 21 14:36:18 2006

This message: [ Message body ]
Next message: Karl Fogel: "Re: (FS) operational question"
Previous message: Karl Fogel: "Re: (FS) operational question"
Maybe in reply to: Karl Fogel: "Greg Stein's several issues: proposed resolutions"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]