[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

backup and restore (was: Re: Am thinking of using subversion as a general purpose filing s ystem)

From: <kfogel_at_collab.net>
Date: 2004-04-02 21:03:09 CEST

andy.glew@amd.com writes:
> Most transactional database systems keep a transaction log
> somewhere. It's sequentially written, and is therefore usually
> captured coherently by even filesystem backup software.
> Moreover, its inconsistencies are easy to detect.
>
> In theory, if the transaction log goes back to the beginning
> of the database, the entire database can be reconstructed.
>
> It's sometimes possible to build a restore tool that
> takes the possibly corrupt database, and the transaction log,
> and gets to a clean state.

Here's a script that demonstrates this for a Subversion repository.
You'll have to piece together what's happening; it's commented, but
you still need to read the code to understand it.

I haven't had time to turn this into a set of user-friendly,
parameterized backup scripts for Subversion admins, unfortunately.
But maybe posting it here will help someone.

--------------------8-<-------cut-here---------8-<-----------------------
#!/bin/sh

# Try some backup/restore procedures for Subversion repositories.

# You might need to customize these paths.
SVN=svn
SVNADMIN=svnadmin
SVNLOOK=svnlook
DB_ARCHIVE=/usr/local/BerkeleyDB.4.2/bin/db_archive
DB_RECOVER=/usr/local/BerkeleyDB.4.2/bin/db_recover

# Any binary file that's about 64k is fine, doesn't have to be /bin/ls.
DATA_BLOB=/bin/ls

# You shouldn't need to customize below here.
SANDBOX=`pwd`/backups-test-tmp
FULL_BACKUPS=${SANDBOX}/full
INCREMENTAL_PREFIX=${SANDBOX}/incremental-logs
RECORDS=${SANDBOX}/records
PROJ=myproj
REPOS=${PROJ}-repos

rm -rf ${SANDBOX}
mkdir ${SANDBOX}
mkdir ${RECORDS}

cd ${SANDBOX}

${SVNADMIN} create --bdb-log-keep ${REPOS}
${SVN} co file://${SANDBOX}/${REPOS} wc

cd wc

# Put in enough data for us to exercise the logfiles.
cp ${DATA_BLOB} ./a1
cp ${DATA_BLOB} ./b1
cp ${DATA_BLOB} ./c1
${SVN} -q add a1 b1 c1
${SVN} -q ci -m "Initial add."

echo "Created test data."

cd ..

# Exercise the logfiles by moving data around a lot. Note that we
# avoid adds-with-history, since those cause much less Berkeley
# activity than plain adds.
#
# Call this from the parent of wc, that is, with $SANDBOX as CWD.
# Pass one argument, a number, indicating how many cycles of exercise
# you want. The more cycles, the more logfiles will be generated.
# The ratio is about two cycles per logfile.
function exercise
{
   limit=${1}

   saved_cwd=`pwd`
   cd ${SANDBOX}/wc

   echo ""
   i=1
   while [ ${i} -le ${limit} ]; do
     mv a1 a2
     mv b1 b2
     mv c1 c2
     ${SVN} -q rm a1 b1 c1
     ${SVN} -q add a2 b2 c2
     ${SVN} -q ci -m "Move 1s to 2s, but not as cheap copies."

     mv a2 a1
     mv b2 b1
     mv c2 c1
     ${SVN} -q rm a2 b2 c2
     ${SVN} -q add a1 b1 c1
     ${SVN} -q ci -m "Move 2s back to 1s, same way."

     echo "Exercising repository, pass ${i} of ${limit}."
     i=`dc -e "${i} 1 + p"`
   done
   echo ""

   cd ${saved_cwd}
}

# Generate some logfile activity.
exercise 10

# Do a full backup.
head=`${SVNLOOK} youngest ${REPOS}`
echo "Starting full backup (at r${head})..."
mkdir ${FULL_BACKUPS}
mkdir ${FULL_BACKUPS}/${PROJ}
mkdir ${FULL_BACKUPS}/${PROJ}/repos
mkdir ${FULL_BACKUPS}/${PROJ}/logs
cd ${REPOS}/db
${DB_ARCHIVE} > ${RECORDS}/${PROJ}-full-backup-inactive-logfiles
cd ../..
cp -a ${REPOS} ${FULL_BACKUPS}/${PROJ}/repos/
cd ${REPOS}/db
for logfile in `${DB_ARCHIVE} -l`; do
  # For maximum paranoia, we want repository activity *while* we're
  # making the full backup.
  exercise 5
  cp ${logfile} ${FULL_BACKUPS}/${PROJ}/logs
done
cat ${RECORDS}/${PROJ}-full-backup-inactive-logfiles | xargs rm -f
cd ../..
echo "Full backup completed (r${head} was head when started)."

# Do the incremental backups for a nominal week.
for day in 1 2 3 4 5 6; do
  exercise 5
  head=`${SVNLOOK} youngest ${REPOS}`
  echo "Starting incremental backup ${day} (at r${head})..."
  mkdir ${INCREMENTAL_PREFIX}-${day}
  mkdir ${INCREMENTAL_PREFIX}-${day}/${PROJ}
  cd ${REPOS}/db
  ${DB_ARCHIVE} > ${RECORDS}/${PROJ}-incr-backup-${day}-inactive-logfiles
  for logfile in `${DB_ARCHIVE} -l`; do
    # For maximum paranoia, we want repository activity *while* we're
    # making the incremental backup. But if we did commits with each
    # logfile copy, this script would be quite slow (Fibonacci effect).
    # So we only exercise on the last two "days" of incrementals.
    if [ ${day} -ge 5 ]; then
      exercise 3
    fi
    cp ${logfile} ${INCREMENTAL_PREFIX}-${day}/${PROJ}
  done
  cat ${RECORDS}/${PROJ}-incr-backup-${day}-inactive-logfiles | xargs rm -f
  cd ../..
  echo "Incremental backup ${day} done (r${head} was head when started)."
done

# The last revision a restoration is guaranteed to contain is whatever
# was head at the start of the last incremental backup.
last_guaranteed_rev=${head}

# Make the repository vanish, so we can restore it.
mv ${REPOS} was_${REPOS}

echo ""
echo "Oliver Cromwell has destroyed the repository! Restoration coming up..."
echo ""

# Restore.
#
# After copying the full repository backup over, we remove the shared
# memory segments and the dav/* stuff. Recovery recreates the shmem
# segments, and anything in dav/* is certainly obsolete if we're doing
# a restore.
#
# Note that we use db_recover instead of 'svnadmin recover'. This is
# because we want to pass the -c ('catastrophic') flag to db_recover.
# As of Subversion 1.0.x, there is no '--catastrophic' flag to
# 'svnadmin recover', unfortunately.
cp -a ${FULL_BACKUPS}/${PROJ}/repos/${REPOS} .
cp -a ${FULL_BACKUPS}/${PROJ}/logs/* ${REPOS}/db
rm -rf ${REPOS}/db/__db*
rm -rf ${REPOS}/dav/*
cd ${REPOS}/db
${DB_RECOVER} -ce
cd ../..
head=`${SVNLOOK} youngest ${REPOS}`
echo ""
echo "(Restored from full backup to r${head}...)"
for day in 1 2 3 4 5 6; do
  cd ${REPOS}/db
  cp ${INCREMENTAL_PREFIX}-${day}/${PROJ}/* .
  ${DB_RECOVER} -ce
  cd ../..
  head=`${SVNLOOK} youngest ${REPOS}`
  echo "(Restored from incremental-${day} to r${head}...)"
done
echo ""
echo "Restoration complete. All hail the King."

# Verify the restoration.
was_head=`${SVNLOOK} youngest was_${REPOS}`
restored_head=`${SVNLOOK} youngest ${REPOS}`
echo ""
echo "Highest revision in original repository: ${was_head}"
echo "Highest revision restored: ${restored_head}"
echo ""
echo "(It's okay if restored is less than original, even much less.)"

if [ ${restored_head} -lt ${last_guaranteed_rev} ]; then
   echo ""
   echo "Restoration failed because r${restored_head} is too low --"
   echo "should have restored to at least r${last_guaranteed_rev}."
   exit 1
fi

# Looks like we restored at least to the minimum required revision.
# Let's do some spot checks, though.

echo ""
echo "Comparing logs up to r${restored_head} for both repositories..."
${SVN} log -v -r1:${restored_head} file://`pwd`/was_${REPOS} > a
${SVN} log -v -r1:${restored_head} file://`pwd`/${REPOS} > b
if cmp a b; then
  echo "Done comparing logs."
else
  echo "Log comparison failed -- restored repository is not right."
  exit 1
fi

echo ""
echo "Comparing r${restored_head} exported trees from both repositories..."
${SVN} -q export -r${restored_head} file://`pwd`/was_${REPOS} orig-export
${SVN} -q export -r${restored_head} file://`pwd`/${REPOS} restored-export
if diff -q -r orig-export restored-export; then
  echo "Done comparing r${restored_head} exported trees."
else
  echo "Recursive diff failed -- restored repository is not right."
fi

echo ""
echo "Done."

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Fri Apr 2 22:15:19 2004

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.