question about fs node IDs

From: Karl Fogel <kfogel_at_galois.collab.net>
Date: 2001-02-21 16:51:57 CET

A while ago, Greg Hudson asked (in a message I can't seem to find) why
it's useful to generate IDs of successor nodes in such a way that you
can tell what they succeeded merely by examining the IDs. This
paragraph in `libsvn_fs/structure' is about that:

   Since nodal revision numbers increase by one each time a delta is
   added, we can compute how many deltas separate two related node
   revisions simply by comparing their ID's. For example, the
   distance between 100.10.3.2 and 100.12 is the distance from
   100.10.3.2 to their common ancestor, 100.10 (two deltas), plus the
   distance from 100.10 to 100.12 (two deltas).

Jim answered Greg, saying (paraphrased) "Try writing the deltification
code and I think you'll see why it comes in handy."

I'm wondering in exactly what circumstance it's handy. In particular,
`structure' documents one kind of node representation:

   ("younger" DELTA CHECKSUM)
       We store a delta that transforms the next younger revision of the
       node into this revision. To find the contents of this revision:
       - We find the unparsed form of the NODE-REVISION skel for the next
         younger revision.
       - We apply DELTA to that string, yielding the unparsed form of the
         NODE-REVISION skel for this revision.

I can see how, given this representation, being able to deduce the ID
of the next-younger node is really handy. However, for a long time
I've been thinking that a more general representation would be:

   ("delta" BASE-ID DELTA CHECKSUM)
       We store a delta that transforms the contents node BASE-ID into
       this revision. To find the contents of this revision:
       - We get the unparsed form of the NODE-REVISION skel for the
         node identified by BASE-ID (it may itself require undeltification)
       - We apply DELTA to that string, yielding the unparsed form of the
         NODE-REVISION skel for this revision.

This would leave the door open for arbitrary efficiencies later; for
example, if the filesystem somehow finds out that this node is only a
tiny diff away from some other (perhaps unrelated) node, it could
replace this node's representation with a delta of the above form.

If we used such a representation, is there still an advantage to the
current ID scheme? (Note: I'm not sure there any disadvantages to it
either, and it does preserve lineage information, so I think I'd still
be uncomfortable changing it. But as we get deeper into the
filesystem, we'll probably want a clear & complete understanding of
what advantages it brings us.)

-K
Received on Sat Oct 21 14:36:23 2006

This message: [ Message body ]
Next message: Kevin Pilch-Bisson: "Re: question about fs node IDs"
Previous message: Ben Collins-Sussman: "Re: CVS update: subversion/subversion/libsvn_fs dag.c dag.h"
Next in thread: Kevin Pilch-Bisson: "Re: question about fs node IDs"
Reply: Kevin Pilch-Bisson: "Re: question about fs node IDs"
Reply: Greg Hudson: "Re: question about fs node IDs"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]