On 07/26/2011 04:20 PM, Stefan Sperling wrote:
> On Tue, Jul 26, 2011 at 01:33:09PM +0700, Andy Canfield wrote:
>> For your information, this is my backup script. It produces a zip
>> file that can be tranported to another computer. The zip file
>> unpacks into a repository collection, giving, for each repository, a
>> hotcopy of the repository and a dump of the repository. The hotcopy
>> can be reloaded on a computer with the same characteristics as the
>> original server; the dumps can be loaded onto a different computer.
>> Comments are welcome.
> Please also make a backup of every individual revision from the
> post-commit hook, like this:
>
> [[[
> #!/bin/sh
>
> REPOS="$1"
> REV="$2"
>
> svnadmin dump "${REPOS}" -q --incremental --deltas -r "${REV}"> /backup/${REV}.dmp
> ]]]
>
> And make /backup a separate filesystem, preferably on a different host
> or some disk storage that runs independently from the host.
In Linux a separate "filesystem" is often another partition on the hard
disk, and thus not to be trusted too much. For safety an external hard
disk, flushed, should be good enough. No need for an entire other host. Yes?
> You will thank me one day when your server's disks die at the wrong moment
> e.g. because of power failure or overheating.
> In such cases it is possible that not all data has been flushed to disk yet.
> The only good data is in the OS buffer cache which the above svnadmin dump
> command will get to see. However, even revisions committed several
> *minutes* before such a crash can appear corrupted on disk when you reboot.
Thank you very much.
Linux ext3 filesystems used to be pretty much immune to unflushed
buffers; anything older than 5 seconds would be written to disk. But
now, with ext4, there's no guarantee, and I've seen the thing have
unflushed buffers after one minute. And I'm not supposed to be able to
see that!
Of course, even Linux isn't immune to a burned-out hard disk controller.
As for overheading, I can't help but joke that perhaps the overheading
was caused by all the backups you were doing? Ha ha. Of course not!
> I've seen this happening (on Windows, with NTFS -- lots of damage;
> but other operating systems aren't immune to this either).
> We could tell that the buffer cache data was good because there were
> multiple corrupted revision files (one repository had more than 20
> broken revisions), each with random junk in some parts, and all broken
> parts were 512 byte blocks, i.e. disk sectors. But in the parts that
> were not broken they referred to each other in ways that made perfect
> sense. So before the crash they were all OK. There were no backups so
> we had to manually repair the revisions (this requires intricate
> knowledge about the revision file format and takes time...)
>
> When this happens you have an unusable repository. Anything referring
> to the broken revisions will fail (commit, diff, update, ...).
> Assuming the incremental dumps weren't destroyed in the catastrophe
> you can load the incremental dumps on top of your last full backup and
> get to a good state that is very close to the point in time when the
> crash happened.
> Without the incremental dumps you'll have the last full backup.
> But anything committed since could be lost forever.
Anything commited since the last full backup would be lost only if no
longer exists in the developer's working copy. The size of the mess
depends on how many commits you lost. Ten per developer per day, and 20
developer, is a massive headache. Two per developer per week, and three
developers, is not a catastrophe.
As I understand Subversion,
[a] The server has no idea who has a working copy.
[b] The checkout builds a working copy on the workstation from the
server's repository.
[c] What is on the developers hard disk is a working copy.
[d] What is on the developer's hard disk continues to be a working copy,
even after a commit.
[e] If the developer tries to make revisions to his working copy six
months after his last commit, then tries to commit, he's going to have a
major mess on his hands trying to reconcile things. The working copy is
still a valid working copy.
[f] Unlike a lock, which he grabs and then releases, he never gives up
his working copy; it is valid perpetually.
[g] The usual way a working copy goes away is with the "rm -rf" command.
Thanks for the great information!
Received on 2011-07-26 14:49:26 CEST