Re: text and diff sharing in the repository

From: Jim Blandy <jimb_at_zwingli.cygnus.com>
Date: 2000-12-08 17:17:57 CET

> Now, in the repository, 10:Q/R/W/foo.c is a new full-text exactly the
> same as 10:A/B/foo.c, and 9:Q/R/W/foo.c is a forward-delta exactly the
> same as our old friend, delta D.
>
> So are we storing the same full text twice, and the same diff twice?
> Or are diffs and texts stored in their own tables, and mentioned only
> by reference from node revisions?

At the moment, we miss that redundancy.

But as you know, how the repository stores a given node is completely
invisible to the outside world, except in terms of performance. If
the repository can figure out that the two files are identical, we
should be able to enhance it to store them both as references to the
same text, without changing the interface.

As I said to Greg H., using ID's as an approximation to node
similarity is kind of gross. It seems to me that the genetic merge
tracking info could give us a better approximation to node similarity,
and also allow us to detect situations like yours. In fact, maybe we
would toss ID's as keys altogether, and instead use the delta sets as
keys directly. That would solve this problem, as well as the
gratuitous asymmetry Greg mentioned.

Of course, this needs to be thought through in detail. I don't feel
entirely comfortable with it yet.
Received on Sat Oct 21 14:36:16 2006

This message: [ Message body ]
Next message: Ben Collins-Sussman: "Re: text and diff sharing in the repository"
Previous message: Jim Blandy: "Re: Filesystem structure question"
In reply to: Karl Fogel: "text and diff sharing in the repository"
Next in thread: Ben Collins-Sussman: "Re: text and diff sharing in the repository"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]