[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Pegged diffs [was: [PATCH] svn diff -r1:BASE URL asserts]

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: 2006-03-13 12:18:25 CET

Malcolm Rowe wrote (earlier):
> Okay, perhaps I have the definition of a 'pegged diff' wrong - what
> exactly is it? I assumed, I guess, that it was one in which we needed
> to follow history to get the name of the item at the operative revisions.

Gosh, this thread has really made me think hard about what we mean. After
several hours and several revisions of my reply, here's what I think, starting
from first principles and working up towards higher-level concepts.


An OBJECT under version control (I mean a file or a directory tree; would
"node" be more consistent with other docs?) is identified by one of:

   * a WC-path (under version control)

   * a URL in a certain revision of the repository

An object has a LINE OF HISTORY which records its life from creation,
potentially through modifications, renames, and/or deletion. (Looking forward
in history, there is the complication that there are potentially multiple lines
of history because of copying, but that's not relevant to the present
discussion.) Identifying an object at any stage of its life implicitly
identifies its line of history.

When a revision is used in conjunction with a path-in-repository to identify an
object, and the object's line of history is followed in order to find a later
or earlier revision in that history, the initial revision is known as the PEG

If we just want to locate a particular object as it exists now or existed in
history, and we know what its path was at that time and do not need to follow
its history, we will specify the object in the same way (WC path, or URL in a
particular revision), and we might still call this particular revision the
"peg" revision, particularly when referring to the syntax "@REV" with which it
is specified.

Any target to an 'svn' command is therefore specified as:

   WCPATH (implicit @WC)

   URL (implicit @HEAD)

When we specify a plain WC path, we are specifying an object by the path it has
"now" in this working copy, not at some previous time when its WC path might
have been different. After "svn move" or "svn rm", the base path will no
longer exist on disk. The following is not currently supported, but would be

   WCPATH@BASE (to specify the WCPATH that existed before mv/rm)

The following do not make sense:

   URL@WC/BASE/COMMITTED/PREV (these keywords have no meaning without a WC)

   WCPATH@HEAD/N/{DATE} (a WC path has no intrinsic meaning in repos *)

   WCPATH@COMMITTED/PREV (these keywords have no meaning until a WC is
                                consulted, but the peg syntax requires them to
                                identify a revision in which the WC path can
                                be found)

Then, in many commands, a separate OPERATIVE REVISION (or a pair of them) can
be specified with "--revision" and Subversion will trace the identified object
through history to that revision, where its name might be different.

"-rX URL@REV" means:
   From the object found at URL in revision REV,
   trace back or forward to revision X.

"-rX wc-path" means:
   From the object found at wc-path (in the WC),
   trace back or forward to revision X.

Note that a peg revision applied to a local path doesn't make sense because a
local path (assumed to refer to a versioned object) already locates the object
uniquely; the peg is implicitly "WC".

The syntax described in "svn help" for some command implies that we currently
allow "wc-path@REV" for arbitrary REVs.

"-rX wc-path@REV" means... what?
   Trace from WC to REV and then further to X? REV is redundant if so.
   Trace from WC to REV and ignore X? "diff --old" does, but that's bad.

(* In the future we might want to give some meaning to "path@REV", where "path"
looks like a WC path, but perhaps means a relative URL based on the current
directory's URL: "foo@100" -> (url(".") + "/foo")@100. This is not a proposal,
just an example of a reason why we should disallow "wc-path@REV" now.)

So "wc-path" always implies a "peg" revision of "WC", in the sense that the WC
is where "wc-path" will be found. (Let's assume for now that we won't allow
the explicit syntax "wc-path@REV".)

For commands that operate on two revisions of a single object, primarily diff
and merge, it is necessary to know in which revision the specified path is to
be found, and a separate peg revision specifier is a convenient and flexible
solution. Commands that operate on a single object (at a time) support a
separately specified peg revision only for convenience and could instead
require the object's path to be specified exactly as it exists in the operative

Does this all make sense so far?

The two TYPES OF DIFF INVOCATION that we support are:

   (a) Difference between one revision and another revision of an object.

   (b) Difference between one object and another which may be unrelated. This
is more general, and potentially covers the first type.

In both types, our user interface allows multiple targets, but those are
handled by high-level iteration in svn_cl__diff(). Although that's
inefficient, it means we don't presently have to think about multiple targets
in this discussion which pertains to libsvn_client and lower layers.

In type (a), there is one object's line of history that must be identified, and
also a "start" revision and an "end" revision within that line. This is what I
think we have been calling a "pegged diff". It corresponds to type (1)
invocation syntax in "svn help diff" and is handled by svn_client_diff_peg3().
  I would say now that "pegged" is not the clearest name for it; something like
"diff of an object across revisions" would be better.

In type (b), there are two objects that must be identified. Each can be
identified by means of a local path, or a URL at a certain revision (which we
might be tempted to call a "peg" revision even if we are not going to trace
history from it to another revision). This corresponds to type (2) syntax in
"svn help diff" and is handled by svn_client_diff3(). If the two objects
happen to be linked by history, then there may be scope for behavioural options
which I won't discuss yet.

I get the impression that there is some confusion about the "real" meaning of
these concepts, commands and functions, as it is easy to think that a "pegged
diff" is any of:

   * a diff where "TARGET@REV" syntax was used
   * a diff where at least one object has to be traced through history
   * a diff comparing two revisions of a single object

These are all slightly different.

Are we happy to accept the third variant, the concept of comparing two
revisions of a single object (necessarily within its own line of history), as
the primary concept that differentiates svn_client_diff3() and
svn_client_diff_peg3() and the like?

Malcolm Rowe wrote:
>>Okay, perhaps that wasn't too clear. In summary:
>>* I don't think a BASE:WORKING diff is pegged, because by definition
>> theres no history-tracing to do. However...

Has the above discussion changed your mind, at least if by "pegged" we mean my
type (a), thus should call svn_client_diff_peg3()?

>>* I don't mind if we just call the pegged version of diff unconditionally,
> where possible (i.e., only in the single-path case)

I don't get that. If you mean single-path as opposed to multi-path, I
certainly don't want a special case for a single path; I thought this would
work for any number of paths. If you mean single-path as opposed to two-path,
as in the "--old/--new" form, then yes, the two-path form is necessarily my
type (b), which we may colloquially refer to as "non-pegged".

>> as long as we pass a peg-revision of svn_opt_revision_unspecified for
>> the BASE:WORKING case.

I'd say the peg should be _revision_working in that case.

>>The latter currently isn't well-defined, though I'm just about to go
>>off and fix that, making the diff (and possibly merge; I've not checked)
>>functions more like the rest of the functions in libsvn_client.
> though we will still need different functions, as the _peg versions only
> take a single path.

As opposed to two paths, I assume you mean. I realise this conversation may be
obsolete by now, but if it's still relevant, why is it a problem that the _peg
functions only take a single path?

- Julian

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Mar 13 12:19:51 2006

This is an archived mail posted to the Subversion Dev mailing list.