[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [subversion-dev] Subversion design document up

From: Jonathan S. Shapiro <shap_at_eros-os.org>
Date: 2000-06-07 00:37:46 CEST

> "Jonathan S. Shapiro" <shap@eros-os.org> writes:
> > Where do you see the efficiency issue? I'm clearly missing something.
> When you update a subdirectory, I think.

There are two reasons to do localized updates: performance and intent. The
performance rationale is always a bad one, and reflects the need to work
around a deficiency in the tool. The intent case is important.

Let me describe how this is handled in DCMS and I think it will be apparent
that there isn't a performance issue in selective update.

DCMS keeps a file on the client side that describes the state of the
workspace. This file contains, for each object (in this case for each file)
the SHA-1 hash of that file. When I type:

    dcms update

the default behavior is to find the "top" of the workspace and update
everything in the workspace. That is, commands are presumed to be about the
workspace, not about the current directory.

The user agent checks to see if a new branch exists. If so, it does a
three-way merge between the new branch, the branch I had checked out, and
the current state of the workspace. The time consuming part of this is
downloading the file that describes the new branch, not identifying the
triples for the merge.

Because the workspace file contains the SHA-1 hash of each file as of the
time it was checked out, identifying the files that are locally changed
requires no communication with the server. I was initially concerned about
the expense of recomputing the checksums, but it's a non-issue. The "dcms
status" command (which is essentially what is being done here) is very fast,
and can be done offline.

If you want selective update, you state the directory name (or the file
name). In that case, the update is restricted to those files whose fsName
matches the pattern on the command line. It's purely a "glob" style match,
and requires no remote communication.

If you have enough entities in the workspace that checking their names is a
serious performance bottleneck, it seems likely that the pattern match isn't
where the performance problem is :-)

In the "laptop" case, there is an additional issue, which is that we assume
there is a repository on the laptop that is a partial clone of some
"official" repository. Think of it as a cache. The new branch is downloaded
through it, as are any changed entities. Note that because cryptographic
hashes are used as an internal naming mechanism it is never necessary to
download material that you already have.

Received on Sat Oct 21 14:36:05 2006

This is an archived mail posted to the Subversion Dev mailing list.