Greg Hudson <ghudson@MIT.EDU> writes:
> > However, the advantage that the RCS-style numbers have is that,
> > given two revision numbers, you can determine their relationship (if
> > any) simply by inspecting the numbers --- no I/O is necessary. In
> > your system, you might need to walk the node's entire history in the
> > database before you could be sure they weren't related. This
> > ability to quickly recognize related nodes was useful in computing
> > deltas.
> Assuming you actually actually meant "was useful" in that tense, can
> you describe where you actually wind up using this feature of
> node-revision IDs?
Check out libsvn_fs/delta.c:replace.
> I have concerns about its actual usefulness. Suppose I create every C
> source file in a source tree by copying copyright-template.c to a new
> file, and every directory by copying directory-template to the new
> place. Now all my C source files live in the same node and all my
> directories live in the same node, but they don't have much in common.
> What are you going to do with the information that two node-revisions
> live in the same node? You can also compute the common ancestor of
> two node-revisions, but again, what are you going to do with that?
That would stymie the test for complete unrelatedness, true. But you
could still tell whether two files were more closely related than
another, which is what delta.c uses.
You will now point out that the distance according to node revision
ids doesn't have any relationship to true textual relatedness --- a
single transaction can accomplish an arbitrary amount of change. And
you might suggest some distance measure related to which *semantic*
changes are present or absent in the file. Which would be great.
I've admitted before that node revision id distance is a shaky
approximation to the true notion of relatedness we all hold in our
heads. I don't really like the RCS-style ID numbers myself, but they
are concrete, simple, and get the job done. I'd like to move to
something better, but nobody has yet suggested an alternative that has
substantial semantic advantages to justify the increased I/O costs.
I think the right solution to this will be something related to
changesets, as we discussed in Chicago. If you read the genetic
merging stuff, you won't be far off.
Received on Sat Oct 21 14:36:21 2006