[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Efficiency of rep-sharing (deduplication) in 1.8 and later

From: Mark Phippard <markphip_at_gmail.com>
Date: Fri, 12 Sep 2014 11:24:43 -0400

On Fri, Sep 12, 2014 at 11:17 AM, Thomas Harold <thomas-lists_at_nybeta.com>
wrote:

> I have a question about how efficient SVN is at de-duplication within a
> repository with regards to files that appear in multiple locations, but
> which have the same content.
>
> I know a small improvement was made in 1.8...
>
> http://subversion.apache.org/docs/release-notes/1.8.html#fsfs-enhancements
>
> > When representation sharing has been enabled, Subversion 1.8 will now
> > be able to detect files and properties with identical contents within
> > the same revision and only store them once. This is a common
> > situation when you for instance import a non-incremental dump file or
> > when users apply the same change to multiple branches in a single
> > commit.
>
> #1 - If a commit puts files A, B and C into the repository, and a latter
> commit puts files B, C and D into the repository at a different
> location, is SVN smart enough to realize that B and C are already stored
> in the repository?
>
> In other words, does it track each individual file separately, even if
> they were all part of one big revision?
>

Representation cache is based on the sha of the rep. So it does not matter
what the filename is or where it is stored. If it has the same sha as an
existing rep, then it will be be shared.

The small improvement in 1.8 was simply to do this for files being added
within the same revision, but the other scenario was already supported.

I think it is worth pointing out that a rep is not necessarily a "file".
 It is the specific delta that SVN would be storing in the repository DB.

-- 
Thanks
Mark Phippard
http://markphip.blogspot.com/
Received on 2014-09-12 17:25:14 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.