[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: text and diff sharing in the repository

From: Jim Blandy <jimb_at_zwingli.cygnus.com>
Date: 2000-12-08 17:17:57 CET

> Now, in the repository, 10:Q/R/W/foo.c is a new full-text exactly the
> same as 10:A/B/foo.c, and 9:Q/R/W/foo.c is a forward-delta exactly the
> same as our old friend, delta D.
>
> So are we storing the same full text twice, and the same diff twice?
> Or are diffs and texts stored in their own tables, and mentioned only
> by reference from node revisions?

At the moment, we miss that redundancy.

But as you know, how the repository stores a given node is completely
invisible to the outside world, except in terms of performance. If
the repository can figure out that the two files are identical, we
should be able to enhance it to store them both as references to the
same text, without changing the interface.

As I said to Greg H., using ID's as an approximation to node
similarity is kind of gross. It seems to me that the genetic merge
tracking info could give us a better approximation to node similarity,
and also allow us to detect situations like yours. In fact, maybe we
would toss ID's as keys altogether, and instead use the delta sets as
keys directly. That would solve this problem, as well as the
gratuitous asymmetry Greg mentioned.

Of course, this needs to be thought through in detail. I don't feel
entirely comfortable with it yet.
Received on Sat Oct 21 14:36:16 2006

This is an archived mail posted to the Subversion Dev mailing list.