Philip Martin wrote:
> I've been thinking of writing some code to populate the rep-cache from
> existing revisions. This code would parse the revision, a bit like
> verify, identify checksums in that revision and add any that are found
> to the rep-cache. This would be time consuming if run on the whole
> repository but would run perfectly well in a separate process while the
> repository remains live. It could also be run over a revision range
> rather than just the whole repository, and running on a single revision
> such as HEAD would be fast.
+1.
> I believe the code will be relative straightforward, if anything it is
> the API that is more of a problem.
>
> - We could add a public svn_fs_rep_cache(). This is backend specific
> but there is precedent: we have svn_fs_berkeley_logfiles() and
> svn_fs_pack().
>
> - We could add a more general svn_fs_optimize(). This would do backend
> specific optimizations that may change in future versions. Perhaps
> passing backend-specific flags?
>
> - We could add the behaviour to svn_fs_recover() by reving the function
> with a revision range. This would "recover" the rep-cache after the
> existing recovery. At present recover is fast so to preserve that
> the compatibility function would pass a revision range that is just
> HEAD.
>
> - We could avoid a public API and call some FSFS function from svnfsfs.
>
> I'll probably go with the last option initially. Any comments?
I think the interface to this should be explicit, not hidden in a
generic 'optimize' or 'recover' function. The last option sounds good
as a starting point.
Other than that, I have no opinions on the API yet, nor on the
specific range of functionality that it should offer (examples:
revision ranges, validating existing entries, clearing part or all of
the cache).
> I should note that WANdisco has an interest in this code being
> developed.
I suppose many companies and power users have to deal with issues
where this would be useful.
It might also be useful to consider whether and how Subversion could
tell us whether the rep cache is up to date -- I haven't thought about
this, but as an initial idea tracking the last revision number N where
all revs [0 .. N] are known to be cached would be a possible starting
point for such a feature.
- Julian
Received on 2015-05-27 22:30:08 CEST