[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Issue #1074: Need a svnadmin command to verify that the repository is not corrupted (Take 2)

From: <cmpilato_at_collab.net>
Date: 2003-08-04 16:40:42 CEST

John Szakmeister <john@szakmeister.net> writes:

> Any recommendations? I can't say that I'm very familiar with Berkeley's
> methodology, so I'm at a loss for any sort of suggestion, other than to say
> that 'svnadmin dump' seems to work okay. :-) What does that code path do
> differently than this technique (it calls svn_repos_dir_delta() to iterate
> over the contents)? I've never seen an instance where it fails.

Actually, these days 'svnadmin dump' uses svn_repos_replay() (the new
and improved fork of svn_repos_dir_delta). The reason we don't lock
issues with this, as with most other filesystem functionalities, is
because we use the Berkeley locks (which, for the purposes of this
discussion, can be mapped to Subversion's trails) for a short time
each -- we get into the database, find some smallish piece of data (like
a directory entries list, the change records for a specific revision),
and get outta there. So what happens is that we are calling
begin_trail() and commit_trail() crazy number of times, but each trail
is used for a relatively small task.

Your patch, however, creates a trail at the start of processing, and
we keep it open while we read every representation and string in the
database -- an arbitrarily large dataset. (And I'm not very clear on
this, but I think Berkeley locks things per-page).

The downside is that the solution is nasty. It requires that you:

   1. Lose the representations walker. Instead you are now doing all
       your work up in the main svn_fs_verify() command.
   2. Use a trail to get the 'next-key' value from the repo.
   3. Call svn_fs__prev_key()* on that value (this will be "the current key").
   4. Try to fetch the representation at the current key. If you
       fail, no sweat.
   5. But if you succeed, read the rep data (which checks the checksums).
   6. If "the current key" != "0", goto step 3 and repeat, otherwise,
       you're done.

So, at most, the only time you spend in a given trail is the time
taken to fetch a single representation, or the time taken to read a
chunk of data from it.

Questions? Comments? Are you ready to throw up your hands and quit
yet? (please say no -- you've been a terrific sport through this)


*svn_fs__prev_key() doesn't really exist, at least not in the trunk
 code. I wrote it (thinking I needed it) on the fs-schema-changes
 branch. You can snarf it from:


To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Aug 4 16:42:43 2003

This is an archived mail posted to the Subversion Dev mailing list.