[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: backing up a repository over a network

From: Ben Collins-Sussman <sussman_at_collab.net>
Date: 2003-06-10 01:59:39 CEST

Faheem Mitha <faheem@email.unc.edu> writes:

> You misunderstand. This is not a complaint or request for help. I just
> wondered why, if the files created are temporary, that the repository
> appears to have changed even after the transaction (eg. `svn co ...') was
> completed. Clearly this is no problem as such.

Here's the detailed answer you're looking for.

Any time you write to berkeley db (BDB) tables, BDB writes information
into its private logs. This is how BDB is able to implement low-level
transactions, roll them back, or restore the whole database to a
consistent state after a system crash.

So when you hear us talking about 'repository logs', it has absolutely
nothing to do with the 'svn log' command or commit logs. We're
talking about BDB's internal logging facility.

If you administer an svn repository, you need to prune the BDB logs
from time-to-time using the 'db_archive' utility; otherwise they grow
forever. If you never delete the BDB logs, then in theory you could
replay every change that has *ever* happened to the BDB tables from
the very beginning of time, but most people don't want or need that.
Normally you just want enough log files lying around so that you can
restore the database to the "last known good" state.

Here are two more implications:

1. As cmpilato said earlier: running 'svn up' creates a temporary
   tree in the repository, one which mirrors the working copy. After
   the temporary tree is compared to the HEAD tree, the temporary tree
   is deleted. So even though 'svn up' is a "read" operation from a
   user's perspective, it still involve writes to the BDB tables.
   That means you could create a repository, and if people never do
   anything but run 'svn co' or 'svn up', the BDB logfiles *will*
   still grow without bound, though very slowly.

2. To back up a BDB "environment" (directory containing BDB tables and
   logs) while the repository is "online" or "live" (being accessed),
   you need to follow a specific procedure (which is in the BDB
   documentation). First, copy the entire directory elsewhere. Then,
   go back and re-copy all the logfiles, because they may have been
   *changed* during the intial copy. Then run 'db_recover' on the copy
   to make sure the logged actions are synchronized with the tables.
   This is what we mean by a "hot backup", and this is what our
   hot_backup.py script does. If you run 'rsync' directly on a live
   repository, it's not going to follow step 2, and thus you're
   running into the problems you originally mentioned. Better to
   rsync the hot-backed-up copy instead.

If you want more detail than this, you need to read the BDB docs at
sleepycat.com. :-)

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Jun 10 02:01:25 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.