Re: (FS) operational question

From: Greg Stein <gstein_at_lyra.org>
Date: 2001-01-01 01:09:01 CET

On Sun, Dec 31, 2000 at 06:28:48PM -0500, Greg Hudson wrote:
> Greg Stein wrote:
> >> If we do lazy updating on the client, then we get fragmentation. If
> >> we want to update them all, then the server response grows larger.
>
> (We're talking about the version resources here, not the version in
> the entries file, yes?)

The version resource URLs, yes.

[ quick aside: a version resource is a specific instance of a
  version-controlled resource (VCR) for a given version. e.g. "revision 6 of
  foo.c". we have one VCR ("foo.c") and multiple version resources. The
  version resource URL is handed out by the server to tell the client where
  to fetch the version resource. ]

Assume that the revision number is embedded in the version resource URL.
Since the client cannot rebuild the URL when a revision-change occurs, it
must receive those from the server.

1) we receive new version resource URLs for every file/dir in the WC each
time a revision is created. obviously, the server response can now grow
to be quite large.

2) we only update version resource URLs for the things that change. since we
   must update version resource URLs and revision numbers in tandem, this
   means that we do not update revision numbers for the things that don't
   change. the side effect is that we now get a scattering of revision
   numbers in the WC, generating a large client-state report during an
   "update" process.

> What's the downside of fragmentation?

The size of the report that details the client-state during an update. We
have to tell the server "here is our state" and the server responds with the
bits things that will need to be updated.

If the WC is all one revision, then we tell the server "I'm at revision 67"
and that's it. If the client is fragmented, then we say "the root is 67,
root/foo is 68, root/foo/bar.c is 73, root/foo/baz.c is 71, ..."

The simple summary: if a revision number is embedded within the version
resource URL, then we end up with large network requests or responses. Pick
one :-)

My proposed solution is to use the ID within the version resource URL. That
would eliminate the need to update them (keeping the server response small),
yet we can still update revision numbers within the WC (keeping the request
size small).

The cost of this change is to allow mod_dav_svn to be able to open nodes by
ID (used during a fetch). During a commit, I'd open a tree for the latest
revision, open the node in question, and validate the provided ID matches
the ID of what I just opened (if not, the client is not up to date).

The nice side effect (which I just realized) is that the client can make
changes against v67 of a file and commit it, even though the repository is
at v1000. If that file hasn't changed, then mod_dav_svn / FS would consider
it up to date since the ID matched.

In the revision-in-the-URL approach, the client would say "I am changing
v67" and the server would need to open *two* nodes (v67 and v1000) and check
whether their IDs match.

> > It's that last sentence I'm not believing... Why does the server
> > response have to get bigger? After an update, the client knows that
> > *every* entity within the update's "purview" is now at the new
> > revision number. But only entities that actually changed in the
> > update need space in the server's response.
>
> > You don't need to update those URLs; libsvn_wc can do it, you don't
> > worry about it.
>
> libsvn_wc cannot do it; a version resource URL is opaque and cannot be
> operated on by the client.

Correct. Based on the assumption:

> (If we assume Subversion conventions for
> those URLs, breaking interoperability, then we don't need to store
> version URLs in the first place. Although... our other bit of
> non-interoperability is in "update" as well; we could declare that
> particular operation to be non-interoperable.)

The use of DAV is to promote *future* interoperability. When that future
arrives, is anybody's guess. But I'm thinking it will be sooner rather than
later. I've already had an inquiry from somebody asking how "thick" the
server is because they would like to make their server SVN-compatible (e.g.
our client would operate against their server, too). Granted, they could
also build in specific SVN behaviors, but they would much prefer to stick to
plain old DAV whereever possible.

Having a server that is as close to the DeltaV spec is also goodness because
of the growth in DAV clients.

All that said: I know people are concerned about whether the use of DAV is
impacting SVN's design. In this case, it is to a *very* limited extent. As I
mentioned: I think the only new API needed is to open a node by id. That
function already exists within the FS, but is currently private. I asked Jim
about exposing it once before, but he said the benefit was small (open by
path was nearly as fast as open by ID). There was also something related to
clones, but that doesn't apply in our case: we *don't* want to see a clone.

The extra API doesn't change the model.

Karl also questioned whether the insertion of the ID into the URL locks us
into a particular server model. Absolutely not. That is *exactly* why the
URLs are provided *only* by the server. The client doesn't know the ID is in
there, and it makes no assumptions that imply the ID is in there. If we
change the model on the server, we simply return different version resource
URLs. Simple as that, and the client is none the wiser.

In summary: the ID-based version resource URL requires exposing an API on
the server, and it does not lock us into a particular server model. It
minimizes the request and response sizes for our network operations (we
always say this is the lengthy time and tradeoff other stuff against it;
well, why make it longer than necessary? :-). Using the ID also gives us a
better mapping against the DeltaV model, which promotes future interop.

Shorter summary: we have much benefits, few costs.

My initial query was whether this would work within the FS. Nobody is
considering that, but whether it is "right" or not. Can I stop explaining
why it is right now? And get back to whether the use of ID is feasible for
the FS?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Received on Sat Oct 21 14:36:18 2006

This message: [ Message body ]
Next message: Greg Stein: "Re: (FS) operational question"
Previous message: Greg Hudson: "Re: (FS) operational question"
In reply to: Greg Hudson: "Re: (FS) operational question"
Next in thread: Greg Hudson: "Re: (FS) operational question"
Reply: Greg Hudson: "Re: (FS) operational question"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]