On 07/27/2011 01:34 AM, Nico Kadel-Garcia wrote:
> On Tue, Jul 26, 2011 at 2:33 AM, Andy Canfield<andy.canfield_at_pimco.mobi> wrote:
>> For your information, this is my backup script. It produces a zip file that
>> can be tranported to another computer. The zip file unpacks into a
>> repository collection, giving, for each repository, a hotcopy of the
>> repository and a dump of the repository. The hotcopy can be reloaded on a
>> computer with the same characteristics as the original server; the dumps can
>> be loaded onto a different computer. Comments are welcome.
> Andy, can we love you to pieces for giving us a new admin to educate
> in subtleties?
Sure! I'm good at being ignorant. FYI I have a BS in Computer Science
about 1970 and an MS in Operations Research in 1972, worked in Silicon
Valley until I moved to Thailand in 1990. So although I am not stupid, I
can be very ignorant.
And also the IT environment here is quite different. For example, MySQL
can sync databases if you've got a 100Mbps link. Ha ha. I invented a way
to sync two MySQL databases hourly over an unreliable link that ran at
about modem speeds. I can remember making a driver climb a flagpole to
make a cell phone call because the signal didn't reach the ground. To
this day we run portable computers out in the field and communicate via
floppynet. In this region hardware costs more than people, and software
often costs nothing.
>> #! /bin/bash
>> # requires root access
>> if [ ! `whoami` == root ]
>> sudo $0
>> # controlling parameters
>> ls -ld $SRCE
> Unless the repository is only readable owned by root, this should
> *NOT* run as root. Seriously. Never do things as the root user that
> you don't have to. If the repository owner is "svn" or "www-data" as
> you've described previously, execute this as the relevant repository
There are reasonable justifications for running it as root:
 Other maintenance scripts must be run as root, and this puts all
maintenance in a central pool. My maintenance scripts are crontab jobs
of the form /root/bin/TaskName.job which runs /root/bin/TaskName.sh and
pipes all stderr and stdout to /root/TaskName.out. Thus I can skim
/root/*.out and have all the job status information at my fingertips.
 For some tasks, /root/bin/TaskName.job is also responsible for
appending /root/TaskName.out to /root/TaskName.all so that I can see
earlier outputs. There is a job that erases /root/*.all the first of
 I have heard for a long time never run GUI as root. None of these
maintenance scripts are GUI.
 There are many failure modes that will only arise if it is run as
non-root. For example, if run as root, the command "rm -rf
/data/svnbackup" will absolutely, for sure, get rid of any existing
/data/svnbackup that exists, whoever it is owned by, whatever junk is
>> # Construct a new empty SVNParent repository collection
>> rm -rf $DEST
>> mkdir $DEST
>> chown $APACHE_USER $DEST
>> chgrp $APACHE_GROUP $DEST
>> chmod 0700 $DEST
>> ls -ld $DEST
> And do..... what? You've not actually confirmed that this has succeded
> unless you do something if these bits fail.
Many of your comments seem to imply that this script has not been
tested. Of course it's been tested already, and in any production
environment it will be tested again. And if stdout and stderr are piped
to /root/SVNBackup.out then I can check that output text reasonably
often and see that it is still running. In this case I would check it
daily for a week, weekly for a month or two, yearly forever, and every
time somebody creates a new repository.
Also, by the standards of this part of the world, losing a day's work is
not a catastrophe. Most people can remember what they did, and do it
again, and it probably only takes a half-day to redo.
>> # Get all the names of all the repositories
>> # (Also gets names of any other entry in the SVNParent directory)
>> cd $SRCE
>> ls -d1 *>/tmp/SVNBackup.tmp
> And *HERE* is where you start becoming a dead man id mkdir $DEST
> failed. I believe that it works in your current environment, but if
> the parent of $DEST does not exist, you're now officially in deep
> danger executing these operations in whatever directory the script was
> run from.
As noted above, $DEST is /data/svnbackup. The parent of $DEST is /data.
/data is a partition on the server. If that partition is gone, that's a
failure that we're talking about recovering from.
>> # Process each repository
>> for REPO in `cat /tmp/SVNBackup.tmp`
> And again you're in trouble. If any of the repositories have
> whitespace in their names, or funky EOL characters, the individual
> words will be parsed as individual arguments.
This is Linux. Anyone who creates a repository with white space in the
name gets shot.
>> # some things are not repositories; ignore them
>> if [ -d $SRCE/$REPO ]
Here is a likely bug in the script. I treat every subdirectory of the
SVNParent repository collection as if it were a repository. But it might
not be. There might be valid reasons for having a different type of
subdirectory in there. Probably this line should read something like
if [ -d $SRCE/$REPO/hooks ]
.. backup the repository..
... just copy it over ...
>> # back up this repository
>> echo "Backing up $REPO"
>> # use hotcopy to get an exact copy
>> # that can be reloaded onto the same system
>> svnadmin hotcopy $SRCE/$REPO $DEST/$REPO
>> # use dump to get an inexact copy
>> # that can be reloaded anywhere
>> svnadmin dump $SRCE/$REPO>$DEST/$REPO.dump
> See above. You're not reporting failures, in case the repository is
> not of a compatible Subversion release as the current "svnadmin"
> command. (This has happened to me when someoone copied a repository to
> a server with older Subversion.)
Yes. But then the failure was on the setting up the repository, not on
backing it up. Perhaps I should run
* svnadmin verify $SRCE/$REPO*
first and take appropriate action if it fails. Oh, please don't tell me
that 'svnadmin verify' doesn't really verify completely!
On another point, "reporting failures" ought to mean "sending e-mail to
the sysadmin telling him that it failed. I've been trying to do that for
years and cannot. I can not send e-mail to an arbitrary target e-mail
address user_at_example.com from a Linux shell script.
* Most require 'sendmail', notoriously the hardest program on the planet
* I found that installing 'sendmail', and not configuring it at all,
prevented apache from starting at boot time. Probably something wrong
* Much of the documentation on sendmail only covers sending e-mail to an
account on that server computer, not to user_at_example.com elsewhere in
the world. As if servers were timesharing systems.
* Sendmail has to talk to an SMTP server. In the past couple of years it
seems as if all the SMTP servers in the world have been linked into an
authorization environment to prevent spam. So you can't just run your
own SMTP - it's not certificated.
* Thunderbird knows how to log in to an SMTP server; last time I looked
sendmail did not.
Without e-mail, any notification system requires my contacting the
machine, rather than the machine contacting me. And that is unreliable.
>> # Show the contents
>> echo "Contents of the backup:"
>> ls -ld $DEST/*
This is for /root/SVNBackup.out. It lists the repositories that have
been included in the backup.
Indeed, the above line that reads
echo "Backing up $REPO"
only exists because hotcopy outputs progress info. I tried "--quiet" and
it didn't shut up. Maybe "-q" works.
>> # zip up the result
>> cd $DEST
>> zip -r -q -y $DEST.zip .
> Don't use zip for this. zip is not installed by default on a lot of
> UNIX and Linux systems, tar and gzip are, and give better compression.
> Just about every uncompression suite in the world supports .tgz files
> as gzipped tarfiles, so it's a lot more portable.
The 'zip' program is installable on every computer I've ever known. And,
at least until recently, there were LOTS of operating systems that did
not support .tar.gz or .tar.bz2 or the like. IMHO a zipped file is a lot
more effectively portable. And the compression ratio is close enough
that I'm willing to get 15% less compression for the portability.
> Also, the script has ignored the problems of symlinks. You may not use
> them, but a stack of people use symlinked files to pre-commit scripts,
> password files, or other tools among various repositories from an
> outside source. If you haven't at least searched for and reported
> symlinks, you've got little chance of properly replicating them for
> use elsewhere.
My guess is that there are two types of symlinks; those that point
inside the repository and those that point outside the repository. Those
that point inside the repository should be no problem. Those that point
outside the repository are bad because there is no guarantee that the
thing pointed to exists on any given machine that you use.
And AFAIK svnadmin and svndump preserve symlinks as such, and that is
the best that I can do in either case.
Also, this is the kind of thing where you backup the symlink and later,
if we must restore, some human being says "What does this symlink point
>> # Talk to the user
>> echo "Backup is in file $DEST.zip:"
>> ls -ld $DEST.zip
> It looks like you're relying on "ls -ld"
Again, this is a more-or-less standard part for the purpose of putting
information into the /root/SVNBackup.out file. All of my backup scripts
do this. Sometimes I look and say to myself "Why did the backup suddenly
triple in size?" and dig around and discover that some subdirectory was
added that should not have be present.
>> # The file $DEST.zip can now be transported to another computer.
> And for a big repository, this is *grossly* inefficient. Transmitting
> bulky compressing files means that you have to send the whole thing in
> one bundle, or incorporate wrappers to split it into manageable
> chunks. This gets awkward as your Subversion repositories grow, and
> they *will* grow because Subversion really discourages discarding
> *anything* from the repositories.
A backup file that is created on an attached portable disk does not need
to be transported.
A backup file that is transmitted over a LAN once a day is not too big,
no matter how big; 3 hours is a reasonable time frame for transport.
Historically I ran a crontab job every morning at 10AM that copied a
backup file to a particular workstation on the LAN. By 10AM that
workstation is turned on, and if it slows down, well, the lady who uses
it is not technically minded enough to figure out WHY it's slowing down.
And it was only a few megabytes.
Yeah, a zip of the entire SVNParent repository collection might be a
too big to send over the Internet.
Oh yes, one more thing. Using svnadmin in various ways it is possible to
purge old revisions from a repository. I would expect that we do that
periodically, maybe once a year. If we're putting out version 5.0 of
something, version 3.0 should not be in the repository, it should be in
> I'd very strongly urge you to revew
> the use of "svnsync" to mirror the content of the repository to
> another server on another system, coupled with a wrapper to get any
> non-database components separately. This also reduces "churn" on your
> drives, and can be so much faster that you can safely run it every 10
> minutes for a separate read-only mirror site, a ViewVC or Fisheye
> viewable repository, or publication of externally accessible
> downloadable source.
I shy away from svnsync right now because it requires me to get TWO of
these Subversion systems running. At present I am almost able to get one
> As harsh as I'm being, Andy, it's actually not bad for a first shot by
> someone who hasn't been steeped in the pain of writing industry grade
> code like some of us. For a one-off in a very simple environment, it's
> fine, something to get this weeks' backups done while you think about
> a more thorough tool, it's reasonable, except for the big booby trap
> David Chapman pointed out about using the hotcopies, not the active
> repositories, for zipping up.
Thank you. I think the key phrase here is "a very simple environment".
How much do we pay for a server? 400 dollars. One guy recommended buying
a server for 4,000 dollars and he was darned near fired for it.
I fixed the booby trap already. Your comments will lead to some other
changes. But not, for now, a second computer.
OH! I thought of something else!
Suppose we do a backup every night at midnight, copying it to a safe
place. And suppose that the server dies at 8PM Tuesday evening. Then all
submits that occurred on Tuesday have been lost. Presumably we'd find
out about this on Wednesday.
But a working copy is a valid working copy until you delete it. Assuming
that the working copies still exist, all we need to do is
* Restore the working SVNParent repository collection on a replacement
* Have everyone 'svn commit' from their working copies.
* Unscramble the merge problems, which should be few.
This becomes feasible if nobody deletes their working copy until 48
hours after their last commit. And my guess is that people will do that
naturally. People who are working on the package will keep one working
copy indefinitely, updating it but not checking out a whole new one.
People who do only brief work on the package will not purge the working
copy until they start worrying about disk space.
Thank you very much.
Received on 2011-07-27 00:15:14 CEST