A while ago, Greg Hudson asked (in a message I can't seem to find) why
it's useful to generate IDs of successor nodes in such a way that you
can tell what they succeeded merely by examining the IDs. This
paragraph in `libsvn_fs/structure' is about that:
Since nodal revision numbers increase by one each time a delta is
added, we can compute how many deltas separate two related node
revisions simply by comparing their ID's. For example, the
distance between 100.10.3.2 and 100.12 is the distance from
100.10.3.2 to their common ancestor, 100.10 (two deltas), plus the
distance from 100.10 to 100.12 (two deltas).
Jim answered Greg, saying (paraphrased) "Try writing the deltification
code and I think you'll see why it comes in handy."
I'm wondering in exactly what circumstance it's handy. In particular,
`structure' documents one kind of node representation:
("younger" DELTA CHECKSUM)
We store a delta that transforms the next younger revision of the
node into this revision. To find the contents of this revision:
- We find the unparsed form of the NODE-REVISION skel for the next
younger revision.
- We apply DELTA to that string, yielding the unparsed form of the
NODE-REVISION skel for this revision.
I can see how, given this representation, being able to deduce the ID
of the next-younger node is really handy. However, for a long time
I've been thinking that a more general representation would be:
("delta" BASE-ID DELTA CHECKSUM)
We store a delta that transforms the contents node BASE-ID into
this revision. To find the contents of this revision:
- We get the unparsed form of the NODE-REVISION skel for the
node identified by BASE-ID (it may itself require undeltification)
- We apply DELTA to that string, yielding the unparsed form of the
NODE-REVISION skel for this revision.
This would leave the door open for arbitrary efficiencies later; for
example, if the filesystem somehow finds out that this node is only a
tiny diff away from some other (perhaps unrelated) node, it could
replace this node's representation with a delta of the above form.
If we used such a representation, is there still an advantage to the
current ID scheme? (Note: I'm not sure there any disadvantages to it
either, and it does preserve lineage information, so I think I'd still
be uncomfortable changing it. But as we get deeper into the
filesystem, we'll probably want a clear & complete understanding of
what advantages it brings us.)
-K
Received on Sat Oct 21 14:36:23 2006