revnum (still) considered harmful

From: Tom Lord <lord_at_regexps.com>
Date: 2002-12-16 22:29:31 CET

> You've misunderstood the code (or ghudson's ra_svn protocol
> is broken, which I highly doubt).

       I think the confusing bit is that the set-target-rev editor
       function is used for updates and similar operations, not for
       commits.

I was confused by reading and misinterpreting the `protocol' file in
the ra_svn directory and the description of the database schema in the
`fs' directory.

An admittedly quick read through the schema document made it seem that
pending transactions are recorded in the database and that that record
includes a transaction number -- which implies the txn number is
assigned early.

The confusion was reinforced by discussion on this list about certain
usage errors / bugs(?). Specifically, it seemed to me that early in
the transaction, a commit examines the revnum of the repository to
make sure that the wd is up-to-date wrt that revnum, and refuses to
proceed if it is not. That too, implies that the client (effectively)
knows its new revnum early in the txn. (I suppose now, in retrospect,
that the commit is not looking at the global revnum, but only at the
last revnum at which files being committed previously changed.)

I think there are still two problems with revnum: (a) a (much
reduced) performance limitation; (b) a semantic problem from the
source mgt. perspective.

(a) the (much reduced) performance limitation:

  While assigning revnum late is far better than assigning it early, the
  existence of revnum _still_ limits server scalability (though in a
  less serious way). In particular, if a single repository is
  implemented over a distributed database, all of the participating
  servers must still synchronize for every transaction in order to
  allocate txn numbers -- you'll still have either a single thread of
  execution or a distributed commit protocol through which all commits
  must pass.

  With no revnum, concurrent, non-overlapping txns can be unordered --
  for example, using a distributed database, synchronization for a set of
  such transactions can be coallesced (reducing the total number of
  syncs) and can take place asynchronously wrt to the txns themselves
  (e.g., well after they have completed and clients have moved on).

Realistically (imo), _this_ performance problem can only ever really
be important for utterly huge transaction rates.

(b) the source mgt problem:

Revnum is harmful for another reason that has nothing to do with
concurrency.

  If I'm reading the FAQ correctly ( :-), revnum is, in essense, an
  implementation detail -- it is "mostly hidden" from users for revision
  control purposes.

  Yet within one repository, merge history is expressed wrt. revnum.
  The emerging plan for distributed revision control seems to be aiming
  at recording merge history as <guid,revnum> pairs.

Thus, the plan for merge history keeps track of history in low level
terms that officially have no high-level rev ctl meaning.

  To understand why that's problematic, it's helpful to consider that
  merge history is not only the underlying support for "smart merging"
  -- it's also a record of reference that human's want to be able to
  read. It should be expressed in higher level terms.

  This gets into smart changeset management. For example, in a single line
  of development one would ideally like human-cosumable names for each
  revision, and (at least in the branches critical to a large
  development effort), to regard each revision as a particular,
  purposeful changset. A query about the revisions for project `foo'
  might generate a list like:

        foo-rev1 added feature xyzzy
        foo-rev2 added feature quux
        foo-rev3 fixed bug #1234
        ....

  When two related lines are merged or partialy merged, those changesets
  are the ideal "unit of merging". One might ask "on my branch, what's
  been merged in from the foo mainline?" and get:

foobranch-rev1
foobranch-rev3

or ask "what's missing from foo?" and get:

foobranch-rev2

  and then, the human reader knows: "The feature `quux' has not been
  merged into foobranch". And the humans have friendly names for the
  changesets in question.

  Moreover, by giving revisions more meaningful, less
  repository-specific names like this, it becomes practical to
  put the tar bundle:

foo-rev2-patch.tar.gz

  on your site, let people merge that with a `patch'-like tool, and have
  the effect be the same as if they'd done an operation between
  repositories.

  It also becomes possible to have "smart merging" technology not be
  specific to any particular rev ctl system -- but to instead have
  systems be interoperable in this regard. I can have a branch in my
  svn repository of a line in your arch repository and smart merge
  between those.

  So, I think that both the intra-repository and global revision names
  for merging purposes should not be based on revnum, but on an
  independent, higher-level namespace.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Dec 16 22:17:59 2002

This message: [ Message body ]
Next message: Philip Martin: "Re: problems building SVN (and http2.0)"
Previous message: Seth Landsman: "Re: problems building SVN (and http2.0)"
In reply to: Greg Hudson: "Re: revnum considered harmful"
Next in thread: Michael Price: "Re: revnum (still) considered harmful"
Reply: Michael Price: "Re: revnum (still) considered harmful"
Reply: Greg Hudson: "Re: revnum (still) considered harmful"
Reply: Greg Stein: "Re: revnum (still) considered harmful"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]