[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit r1435746 - svn_fs_verify() - issues #3995, #4211

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Mon, 28 Jan 2013 01:36:53 +0100

On Wed, Jan 23, 2013 at 6:56 PM, Julian Foad <julianfoad_at_btopenworld.com>wrote:

> > URL: http://svn.apache.org/viewvc?rev=1435746&view=rev
>
> > Log:
> > Fix issues #3995 (redesign svn_fs_verify() for 1.8) and #4211 (verify
> > is slow and needs to handle node verification better).
> >
> > This patch does two things. First, it adds a notification callback
> > to svn_fs_verify() and forwards its output to our console stream.
> >
> > The second change is that revision-specific checks are begin moved
> > from svn_fs_fs__verify() to a new svn_fs_verify_rev() function.
> > Because the latter is being called as part of the per-revision checks,
> > progress is visible and access / cache locality is very much improved.
>
> The latter is now being called as part of svn_repos_verify_fs2()'s
> per-revision checking loop (not as part of svn_fs_[fs__]verify()'s per-rev
> checks, which is how I interpreted this at first).
>

I updated the log message to make that clear.

> Because of that, the already existing '* Verified revision R' notification
> from that loop now means both 'dump revision R' and svn_fs_verify_rev(R)
> have been done. There will no longer be the initial long delay which was
> previously caused by svn_fs_verify() verifying all revs in the repo before
> the first '* Verified revision R' notification.
>

But it may still take quite some time because all representations
will be accessed once to verify that they actually exist as promised
by the rep-cache. Reading 10 .. 100GB from disk will still take time.

> But you are still calling svn_fs_verify(..., start, end, ...) before the
> per-rev loop. This was puzzling until I saw that the 'start' and 'end'
> arguments here are no longer used for per-rev full verification but are
> still used for rep-cache verification.

This revision range is rather an optimization reducing the amount
of data to process. Depending on the nature of the tests we might
perform in the future, findings / error messages may still refer to
revisions outside that range. E.g. "r123 based on r99, which is empty".

This is now the only potentially long-running check before the main loop
> starts, and it now sends its own notifications.
>

Depending on what you check, the order may also be non-linear
and even repeating. The only point here is to give the user some
heartbeat signal and ideally sense of progress.

>
> So the overall output is now like this (on one of my test repos):
> * Verifying global structure ...
> * Verifying structure at revision 74 ...
> * Verifying structure at revision 481 ...
> * Verifying structure at revision 831 ...
> * Verified revision 0.
> * Verified revision 1.
> * Verified revision 2.
> * Verified revision 3.
>
> OK. Maybe s/structure/repository metadata/; what do others think?
>

Done in r1439211.

>
> > /**
> > * Perform backend-specific data consistency and correctness validations
> > * to the Subversion filesystem located in the directory @a path.
> > *
> > * @a start and @a end may be #SVN_INVALID_REVNUM, in which case
> > * svn_repos_verify_fs2()'s semantics apply. When @c r0 is being
> > * verified, global invariants may be verified as well.
> > * ... */
> > svn_fs_verify(path, cancel..., notify..., start, end, pool);
>
> This needs to say what the START and END parameters are for, and make
> clear that it doesn't call svn_fs_verify_rev() for each rev.
>

Tried to clear that up in r1439214.

> > /**
> > * Perform backend-specific data consistency and correctness validations
> > * to revision @a revision of the Subversion filesystem @a fs.
> > * ... */
> > svn_fs_verify_rev(fs, revision, pool);
>
> I'm still unclear of the relationship between svn_fs_verify_rev() and
> performing a dump of the rev. I assume they are complementary, but it
> seems odd, as it feels to me like a good implementation of
> svn_fs_verify_rev() would do all that a dump does and more.
>

Some tests are very expensive to set up. E.g. rep-cache.db
does not have a revision index and will thus we always do
a full table scan.

> > /* Verify the fsfs filesystem FS. Use POOL for temporary allocations. */
> > svn_fs_fs__verify(fs, cancel..., notify..., start, end, pool);
> >
> > /* Verify REVISION in filesystem FS. Use POOL for temporary
> allocations. */
> > svn_fs_fs__verify_rev(fs, revision, pool);
> >
> > /* Verify metadata for ROOT.
> > ### Currently only implemented for revision roots. */
> > svn_fs_fs__verify_root(svn_fs_root_t *root, pool);
>
> Similar docs are needed here too, or a cross-reference such as "Implements
> svn_fs_verify...() for FSFS".
>

Done in r1439210.

The order of parameters in svn_fs_verify() and the corresponding vtable
> function and similar places would be more consistent with elsewhere if
> rearranged as:
> (main params,
> notification,
> cancellation,
> pools).
>

Done in same revision.

> > Modified: subversion/trunk/subversion/libsvn_fs_fs/rep-cache.c
> >
> ==============================================================================
> > --- subversion/trunk/subversion/libsvn_fs_fs/rep-cache.c (original)
> > +++ subversion/trunk/subversion/libsvn_fs_fs/rep-cache.c Sat Jan 19
> 22:45:02
> > @@ -135,6 +135,8 @@ svn_fs_fs__walk_rep_reference(svn_fs_t *
> > void *walker_baton,
> > svn_cancel_func_t cancel_func,
> > void *cancel_baton,
> > + svn_fs_progress_notify_func_t notify_func,
> > + void *notify_baton,
> > svn_revnum_t start,
> > svn_revnum_t end,
> > apr_pool_t *pool)
> > @@ -164,6 +167,9 @@ svn_fs_fs__walk_rep_reference(svn_fs_t *
> > SVN_ERR(svn_sqlite__reset(stmt));
> > if (SVN_IS_VALID_REVNUM(max)) /* The rep-cache could be empty. */
> > SVN_ERR(svn_fs_fs__revision_exists(max, fs, iterpool));
> > +
> > + if (notify_func)
> > + notify_func(SVN_INVALID_REVNUM, notify_baton, iterpool);
> > }
> >
> > SVN_ERR(svn_sqlite__get_statement(&stmt, ffd->rep_cache_db,
> > @@ -210,6 +216,16 @@ svn_fs_fs__walk_rep_reference(svn_fs_t *
> > return svn_error_compose_create(err, svn_sqlite__reset(stmt));
> >
> > SVN_ERR(svn_sqlite__step(&have_row, stmt));
> > +
> > + /* Notify (occasionally, because walking is fast and we can't
> > + guarantee a properly ordered notification sequence anyway) */
> > + if ( notify_func
> > + && (iterations % 1024 == 0)
> > + && (rep->revision != last_notified_revision))
> > + {
> > + notify_func(rep->revision, notify_baton, iterpool);
> > + last_notified_revision = rep->revision;
> > + }
> > }
>
> This function already calls a callback as its main purpose. We don't need
> to add a separate notification
> mechanism so it can tell us that it's called the callback: we can
> implement whatever notification we want, *inside* the
> existing callback.
>
> There's only one special case here, which is you put a notification in
> after it does the initial global invariant. That's not really needed, as
> the caller knows it's about to be done when it requests start==0, and knows
> it's been done when it gets the first callback. (If you really want to
> make a call at this point, it would be equally OK to call the main callback
> with REP=NULL.)
>
> Advantages: (1) simplicity; (2) the *user* of the function controls how
> often it notifies: it might want to do so on the basis of elapsed time
> rather than number of loop iterations for example.
>

Good point! Done in r1439212.

Once again an excellent review, Julian!

-- Stefan^2.

-- 
Certified & Supported Apache Subversion Downloads:
*
http://www.wandisco.com/subversion/download
*
Received on 2013-01-28 01:37:26 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.