[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Populating the rep-cache

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Fri, 29 May 2015 09:50:13 +0200

On Thu, May 28, 2015 at 6:00 PM, Stefan Fuhrmann
<stefan.fuhrmann_at_wandisco.com> wrote:
> On Wed, May 27, 2015 at 8:14 PM, Philip Martin <philip.martin_at_wandisco.com>
> wrote:
>> Julian Foad <julianfoad_at_gmail.com> writes:
>> > Stefan Fuhrmann wrote:
>> >> * clear the rep-cache.db
>> >
>> > Clearing the cache and continuing operation may make subsequent
>> > commits much larger than they should be, and there is no easy way to
>> > undo that if it happens.
>> I've been thinking of writing some code to populate the rep-cache from
>> existing revisions. This code would parse the revision, a bit like
>> verify, identify checksums in that revision and add any that are found
>> to the rep-cache. This would be time consuming if run on the whole
>> repository but would run perfectly well in a separate process while the
>> repository remains live. It could also be run over a revision range
>> rather than just the whole repository, and running on a single revision
>> such as HEAD would be fast.
> Makes sense.
>> I believe the code will be relative straightforward, if anything it is
>> the API that is more of a problem.
>> - We could add a public svn_fs_rep_cache(). This is backend specific
>> but there is precedent: we have svn_fs_berkeley_logfiles() and
>> svn_fs_pack().
>> - We could add a more general svn_fs_optimize(). This would do backend
>> specific optimizations that may change in future versions. Perhaps
>> passing backend-specific flags?
> I think svn_fs_optimize(bool online) would make sense
> in the longer term.
> In the "offline" case, it could do anything from removing
> duplicate reps as we build the cache to sharding repos
> or repacking shards. Not that I would want to implement
> any of that soon.

I was wondering about that too. I think repopulating the rep-cache
(without the need to take the repos offline) is very interesting, but
I immediately think: functionality to repopulate the rep-cache *and*
(optionally) rewrite rev files to let them use rep sharing (i.e.
effectively deduplicating the repository) ... that would be even

But big +1 on the initial idea already for offering the ability to
rebuild a broken rep-cache (without having to dump/load).

Received on 2015-05-29 09:51:33 CEST

This is an archived mail posted to the Subversion Dev mailing list.