On Tue, Aug 2, 2011 at 2:07 AM, zhiwei chen <zhiweik_at_gmail.com> wrote:
> hi,everyone.
> We have many svn repositories,more than 100,000 , but every repository has
> less than 1024M.
> So,which svn backup strategies should I use ?
Great bird of space, what are you running? Sourceforge? You're
approaching 100 TB of repository space!!!
I have to assume that 99.9% of these are idle auto-generated
repositories created as part of sme regression testing or continuous
build structure. I went through something like this with an in-house
backup system that used a database to manage hardlinks, and most of
whose directories had no actual edits or unlocked files in them. I had
to optimize it by basically ignoring all the non-active equivalent of
tags, which turned it from an insane 5 day restoration procedure to a
2 hour restoration procedure.
I assume that the old, stable repositories are what most of us would
use as tags: suitable to lock down and backup up with rsync, star, or
a similar tool that will not re-copy every byte every time you run it,
that can be run twice without overwriting already transmitted files,
and that can be gracefully managed to select or deselect targets. This
will mirror not only the revisiions, but the file ownership,
authentication and scripting internal to the repository. It won't
mirror HTTP access or web configs, or SSH based access configurations,
so treat that separately.
That said, the databases can be synchronized with svnsync on a remote
server for efficiency, and to help avoid corruption issues from
mirroring files in the midst of database interactions. This will *not*
gain you fail over repositories with identical uuid's suitable for
"svn switch" operations, but it will also allow you to update your
backup server's subversion binaries without interfering with the
primary system.. Any repository that has had updates since the last
svnsync, svnadmin dump, or other backup technology, however, will be
prone to "split-brain" problems where a new revision submitted on the
failover or recovered server does not match the revision previously
with the same number on the original server, and chaos will ensue.
Split-brain is something that people don't seem to worry about much
for small repositories: you can notify your clients that they need to
re-checkout their working copies and copy over their working files,
and they'll only lose some recent commits. But it's potentially
really, really nasty to automated procedures.
Frankly, this is the point where you call WanDisco and say "Hi, I've
got a problem: do you have a commercial grade solution?" They have
tools that will do multi-master setups and avoid the "split-brain"
problem, and have probably already addressed the backup needs.
Received on 2011-08-02 13:36:14 CEST