Re: Looking to dump/load with minimal repository downtime

From: Eric Johnson <eric_at_tibco.com>
Date: Tue, 19 Aug 2014 09:55:15 -0700

Thanks for the excellent answers to my question! One more comment and
question below.

On Tue, Aug 19, 2014 at 12:05 AM, Branko Čibej <brane_at_wandisco.com> wrote:

> On Mon, Aug 18, 2014 at 7:43 PM, Eric Johnson <eric_at_tibco.com> <eric_at_tibco.com> wrote:
>
> I've got a crazy idea for minimizing downtime - since the servers are not
> being constantly bombarded with commits, I'm thinking that I can just do a
> dump/load on the fly, and see if any commits came in while the dump was
> happening, and if they did, try the dump again.
>
> That is:
>
> STARTREV = `svnlook youngest repodir/$REPONAME`
> svnadmin create reload/$REPONAME
> svndump repodir/$REPONAME | svnadmin load reload/$REPONAME
> ENDREV = `svnlook youngest repodir/$REPONAME`
> if "$STARTREV" == "$ENDREV"; then
> mv repodir/$REPONAME older/$REPONAME
> mv reload/$REPONAME repodir/$REPONAME
> fi
> # and if the above didn't match, repeat the above until success.
>
> There's the tiniest bit of a race condition at the very end. But is it safe
> to do an svnadmin dump on a live repository?
>
>
> Dumping a live repository is perfectly safe. When the first (complete)
> dump and load finish, you don't have to retry the whole dump if there were
> any commits to the source repository; just do an incremental dump and
> incremental load.
>

I just did a timed comparison of an svndump versus svnsync. svndump was
over 10x faster for my test case (20 min vs. 260 min). So I'll follow the
dump approach.

>
> In other words:
>
> 1. Perform a full dump from [old] and load into [new]
> 2. Make [old] read-only
> 3. If there were any commits during (1), perform an incremental dump
> from [old] and load into [new]
> 4. Switch the repositories and restart the server to purge in-memory
> caches.
>
> The solution with svnsync is similar, except that instead of running
> 'svnadmin dump/load' you'll run 'svnsync'.
>
> Both solutions have a downside: they will not notice revision property
> changes (e.g., log message changes) in revisions that have already been
> synced to the new repository. There are ways to solve that, but by far the
> easiest way is to block revprop changes in the pre-revprop-change hook in
> the old repository while you're doing the upgrade.
>
> Just remember to always restart the server after switching from the old to
> the new repository to remove stale caches. (Yes, we're currently working on
> making that mostly unnecessary in 1.9.)
>

What are these caches? I've got memcached turned off for these (fsfs)
repositories. What, exactly, do I need to restart? All client access is via
HTTP. Restarting Apache for each of my repository updates would be
unfortunate, because I want to keep the number of times I bounce the server
to a low value, even if it does only take 3s to restart.

It is one thing for me to take an individual repository off-line for 1-5
seconds while I swap and do an incremental load, but it is another if I
take all 80+ repositories offline repeated.

Has anybody ever published a script that loops through a bunch of
repositories, and dump/loads, moves the old repository aside, does an
incremental dump/load, copies over existing hook scripts, and then moves
the new copy into place? I didn't see anything obvious in the "contrib"
sources.

Eric
Received on 2014-08-19 18:56:04 CEST

This message: [ Message body ]
Next message: Branko Čibej: "Re: Looking to dump/load with minimal repository downtime"
Previous message: Branko Čibej: "Re: Looking to dump/load with minimal repository downtime"
In reply to: Branko Čibej: "Re: Looking to dump/load with minimal repository downtime"
Next in thread: Branko Čibej: "Re: Looking to dump/load with minimal repository downtime"
Reply: Branko Čibej: "Re: Looking to dump/load with minimal repository downtime"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]