[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Q: identical files "shared" in repository?

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2005-04-11 20:16:15 CEST

On Mon, 2005-04-11 at 01:39, Ph. Marek wrote:
> Say, I've got a repository with trunk T and branch A.
> If there are changes to T which get merged into A (and committed to A) it's
> very likely that files changed by the merge have not been tampered with on
> the branch, ie. they are only modified in the trunk.
> If that gets committed, the difference on A gets committed into the
> repository, right? Or are just pointers to trunk's file contents stored?

The differences are stored twice.

There are two ways to look at fixing this:

  #1: Notice that the checksums of a new file and an old file are the
same, do a full-text comparison to make sure (or don't), and re-use the
representation of the old file. There are some ramifications for the
skip-delta code, but this could have space-saving benefits in a variety
of circumstances.

  #2: When performing a merge, notice when the result of the merge is
the same as the merged ancestor. Communicate this to the repository
somehow when performing the commit. In the repository, reuse the merged
ancestor's representation instead of storing a diff against the
historical ancestor. Again, there are some ramifications for the
skip-delta code. The benefits wouldn't be as general as those of #1,
but we wouldn't need to create machinery in the FS to be able to look up
file checksums. (We would have to extend the editor API and RA
protocols instead.)

  Variants on #2: we could store a diff against the merged ancestor if
such a diff would be smaller than the diff against the historical
ancestor, not just if the merged ancestor is identical to the result.
Also, we could try to apply #2 to directories as well as files; e.g. if
a branch modifies only the "tools" directory, merges of the trunk could
reuse the trunk's representation of the entire "subversion" directory
and other unchanged subdirs of the trunk.

Neither change is easy, unfortunately.

In addition to collecting statistics on how much these solutions might
save us, we might also want to think about how the current scheme might
be distorting repository practices. For instance, how many Subversion
development branches are we willing to tolerate, given that each one
generates a weekly merge of all the changes to the trunk?

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Apr 11 20:17:24 2005

This is an archived mail posted to the Subversion Dev mailing list.