[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] issue 2286 - space saving in the repository

From: Malcolm Rowe <malcolm-svn-dev_at_farside.org.uk>
Date: 2006-05-20 17:50:45 CEST

On Sat, May 20, 2006 at 05:25:52PM +0200, Peter N. Lundblad wrote:
> Malcolm Rowe writes:
> > I looked at doing something similar a few months back (hardlinking
> > identical representation inside FSFS). I was put off by the fact that it
> > was impossible to do it streamily (you'd need to rewrite the rev file,
> > which wasn't something I was happy doing as an online operation) and
>
> I haven't tried, but my idea was to, when a file is finished (the NULL
> delta window is received or the generic stream is closed), you check
> the representation index. If you get a match, couldn't you just seek
> to the beginning of the representation and rewrite it to point at the
> other representation. Then just truncate the revisino file.
>

Yes, that should work. There was some reason that I didn't want to do
that at the time, but I can't think why, so it probably wasn't important.

> > also because the FSFS backend at least seems to use the representation
> > key as a way to determine whether two nodes are 'the same' - and it's
> > not immediately clear whether the kind of sameness is one that should
> > also be true for hard-linked nodes.
> >
> If that's the case, we could introduce a proxy representation to be
> able to detect the difference, couldn't we?
>

Yes, and that's what Philip's proposal does, in effect. I'm not sure
whether we actually _need_ it though, and as Max points out, UNIX
hard-links don't have a 'master' copy.

btw, the reason I started to look at this wasn't for space-saving (though
I realised that that was an advantage) - rather, I noticed that if we
had some way to go from a representation to the noderevs that share that
representation, we'd be able to create a form of 'svn locate' command: one
that could answer questions like "In what other branches/tags/revisions
does this file (or a file with the same contents) exist?".

An MD5->representation index was the obvious solution for this, and
coelescing identical representations not only saved space, but meant
that we'd not have to check for MD5 collisions at locate time (though
we'd need to check at commit time, and have some kind of uniquifier for
handling non-identical files with identical MD5's).

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat May 20 17:51:10 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.