[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Subversion branch deltification policy is more space-hungry than CVS

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2004-05-29 18:49:57 CEST

On Fri, 2004-05-28 at 11:11, Vincent Lefevre wrote:
> If I've understood correctly, there seems to be a problem, though,
> when directories or files are moved/renamed. It has been said that
> this was implemented as a copy+delete. But Mark Benedetto King said
> that fulltexts live forever, even if one deletes the branch (here
> the branch would correspond to the original filename). And as the
> copy would normally be modified since it is the main file (under a
> new name), this would give a second fulltext.

This argument makes perfect sense, but it turns out to be wrong. It is
only when lines of development diverge that you get multiple plaintext
representations of a file, not simply when a new copy-ID is created. In
your explample, the node-rev history of the file might look something
like:

1.1.1 -- 1.1.2 -- 1.1.3
                        \
                          1.2.4 -- 1.2.5

(The first number is the node-ID, always 1 in this example, the second
number is the copy-ID, which in this example is 1 before the move and 2
after, and the third number is the txn-ID.) When 1.2.4 is committed,
the deltification process walks along the predecessor list without
respect to copy-ID changes, so 1.1.3 is deltified against 1.2.4.

If the file were not moved but were instead copied and then changed in
both locations, then the node-rev history might look like;

1.1.1 -- 1.1.2 -- 1.1.3 -- 1.1.6
                        \
                          1.2.4 -- 1.2.5

When 1.1.6 is committed, 1.1.3 is deltified against it, smashing the
previous delta representation against 1.2.4. This is not because 1.1.6
has the same copy-ID as 1.1.3, but merely because (in this example)
1.1.6 came along later. (Ideally, we might be able to notice that 1.1.3
already has a delta representation with the same distance, and pick
whichever delta representation is smaller. But we're not that smart,
and with the current schema we don't have any way of being that smart
because we don't have a distance field in representations.)

In case it hasn't filtered through, incidentally, FSFS does not have
multiple plaintexts even when lines of development diverge. So, in 1.1,
if users are seeing unacceptable expansions of their CVS repositories,
they will have an alternative which doesn't use as much space. (And
while it does use more time for a head checkout than BDB does, it scales
way better than CVS does for branches.)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat May 29 18:50:31 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.