[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Q: identical files "shared" in repository?

From: Max Bowsher <maxb_at_ukf.net>
Date: 2005-04-11 19:02:31 CEST

kfogel@collab.net wrote:
> "Ph. Marek" <philipp.marek@bmlv.gv.at> writes:
>> IF we rely on MD5 being good enough (or SHA-512 or whatever can be
>> used in the future, that is, is exported by apr), we could "share"
>> identical contents in the repository.
>
> There's no need to "rely" on it -- as you pointed out later, we could
> use the checksum as the first signal, and then do a full content
> comparison to make absolutely sure.
>
> But before implementing this, we need some statistics on how much it
> would actually save us. Are identical file contents common in
> repositories? Have you done any experiments to find out? That's the
> most important step here. If it turns out to be worthwhile, the
> implementation of this optimization wouldn't be hard at all.

Consider the case of merging of changes into a long running feature branch.
Many files will be modified since the branchpoint, but many of those files
will only ever receive merged changes, and so will be identical to a version
on trunk.

It would be even nicer if subversion could notice this commonality when
asked for a diff, and save time.

Also, consider the huge number of individual strings all holding the text
"native" (i.e. svn:eol-style propvals) in a typical source code repository!

Max.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Apr 11 19:08:29 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.