Re: Populating the rep-cache

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Thu, 28 May 2015 18:00:38 +0200

On Wed, May 27, 2015 at 8:14 PM, Philip Martin <philip.martin_at_wandisco.com>
wrote:

> Julian Foad <julianfoad_at_gmail.com> writes:
>
> > Stefan Fuhrmann wrote:
> >> * clear the rep-cache.db
> >
> > Clearing the cache and continuing operation may make subsequent
> > commits much larger than they should be, and there is no easy way to
> > undo that if it happens.
>
> I've been thinking of writing some code to populate the rep-cache from
> existing revisions. This code would parse the revision, a bit like
> verify, identify checksums in that revision and add any that are found
> to the rep-cache. This would be time consuming if run on the whole
> repository but would run perfectly well in a separate process while the
> repository remains live. It could also be run over a revision range
> rather than just the whole repository, and running on a single revision
> such as HEAD would be fast.
>

Makes sense.

> I believe the code will be relative straightforward, if anything it is
> the API that is more of a problem.
>
> - We could add a public svn_fs_rep_cache(). This is backend specific
> but there is precedent: we have svn_fs_berkeley_logfiles() and
> svn_fs_pack().
>
> - We could add a more general svn_fs_optimize(). This would do backend
> specific optimizations that may change in future versions. Perhaps
> passing backend-specific flags?
>

I think svn_fs_optimize(bool online) would make sense
in the longer term.

In the "offline" case, it could do anything from removing
duplicate reps as we build the cache to sharding repos
or repacking shards. Not that I would want to implement
any of that soon.

OTOH, a new FS API makes only sense if we can control
it nicely and generically from svnadmin or its ilk. It seems
to me that a generic "make stuff better" optimize run has
its merits (e.g. after an svnadmin upgrade) but most people
probably want to tune only specific aspects. That's because
they are likely to have large repos that they can't take them
offline for long.

> - We could add the behaviour to svn_fs_recover() by reving the function
> with a revision range. This would "recover" the rep-cache after the
> existing recovery. At present recover is fast so to preserve that
> the compatibility function would pass a revision range that is just
> HEAD.
>

There is nothing inherently wrong or broken with having an
incomplete rep cache. So, making this part of the recovery
procedure feels wrong.

- We could avoid a public API and call some FSFS function from svnfsfs.
>

That is probably the best place even longer-term.

-- Stefan^2.
Received on 2015-05-28 18:02:16 CEST

This message: [ Message body ]
Next message: Stefan Fuhrmann: "Re: svn commit: r1682265 - /subversion/trunk/subversion/libsvn_fs_fs/util.c"
Previous message: Ivan Zhakov: "Re: svn commit: r1682265 - /subversion/trunk/subversion/libsvn_fs_fs/util.c"
In reply to: Philip Martin: "Populating the rep-cache"
Next in thread: Johan Corveleyn: "Re: Populating the rep-cache"
Reply: Johan Corveleyn: "Re: Populating the rep-cache"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]