Re: question about fs node IDs

From: Kevin Pilch-Bisson <kevin_at_pilch-bisson.net>
Date: 2001-02-21 20:13:16 CET

On Wed, Feb 21, 2001 at 09:51:57AM -0600, Karl Fogel wrote:
> A while ago, Greg Hudson asked (in a message I can't seem to find) why
> it's useful to generate IDs of successor nodes in such a way that you
> can tell what they succeeded merely by examining the IDs. This
> paragraph in `libsvn_fs/structure' is about that:
>
> Since nodal revision numbers increase by one each time a delta is
> added, we can compute how many deltas separate two related node
> revisions simply by comparing their ID's. For example, the
> distance between 100.10.3.2 and 100.12 is the distance from
> 100.10.3.2 to their common ancestor, 100.10 (two deltas), plus the
> distance from 100.10 to 100.12 (two deltas).
>
> Jim answered Greg, saying (paraphrased) "Try writing the deltification
> code and I think you'll see why it comes in handy."
>
> I'm wondering in exactly what circumstance it's handy. In particular,
> `structure' documents one kind of node representation:
>
> ("younger" DELTA CHECKSUM)
> We store a delta that transforms the next younger revision of the
> node into this revision. To find the contents of this revision:
> - We find the unparsed form of the NODE-REVISION skel for the next
> younger revision.
> - We apply DELTA to that string, yielding the unparsed form of the
> NODE-REVISION skel for this revision.
>
> I can see how, given this representation, being able to deduce the ID
> of the next-younger node is really handy. However, for a long time
> I've been thinking that a more general representation would be:
>
> ("delta" BASE-ID DELTA CHECKSUM)
> We store a delta that transforms the contents node BASE-ID into
> this revision. To find the contents of this revision:
> - We get the unparsed form of the NODE-REVISION skel for the
> node identified by BASE-ID (it may itself require undeltification)
> - We apply DELTA to that string, yielding the unparsed form of the
> NODE-REVISION skel for this revision.
>
> This would leave the door open for arbitrary efficiencies later; for
> example, if the filesystem somehow finds out that this node is only a
> tiny diff away from some other (perhaps unrelated) node, it could
> replace this node's representation with a delta of the above form.

I like this idea much better than the "younger" form, which as I see it,
is a holdover from the days of RCS. Why should we only be able to have
deltas with respect to neighbourly revisions?

>
> If we used such a representation, is there still an advantage to the
> current ID scheme? (Note: I'm not sure there any disadvantages to it
> either, and it does preserve lineage information, so I think I'd still
> be uncomfortable changing it. But as we get deeper into the
> filesystem, we'll probably want a clear & complete understanding of
> what advantages it brings us.)
>
Not having been too involved in the fs code, I am not sure about
remaining advantages, but like you, I also can't see any disadvantages.

All in all, I think this is a good suggestion.

> -K

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Kevin Pilch-Bisson                    http://www.pilch-bisson.net
     "Historically speaking, the presences of wheels in Unix
     has never precluded their reinvention." - Larry Wall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

application/pgp-signature attachment: stored

Received on Sat Oct 21 14:36:23 2006

This message: [ Message body ]
Next message: Greg Hudson: "Re: question about fs node IDs"
Previous message: Karl Fogel: "question about fs node IDs"
In reply to: Karl Fogel: "question about fs node IDs"
Next in thread: Greg Hudson: "Re: question about fs node IDs"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]