[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: verify_as_revision_before_current_plus_plus() on a production repo?

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Tue, 31 Jan 2017 11:06:07 +0000

Julian Foad wrote on Tue, Jan 31, 2017 at 10:22:06 +0000:
> Daniel, thanks for your comments.
>
> Daniel Shahaf wrote:
> >Julian Foad wrote on Mon, Jan 30, 2017 at 21:40:00 +0000:
> >>it is a little "tricky": in particular, it opens a second FS instance and
> >>fakes the "youngest revision seen" field and then relies on this FS instance
> >>never reading the "current" file
> >
> >Right: if some code refreshes youngest_rev_cache (by open()ing
> >'current', reading a value from it, and setting youngest_rev_cache to
> >that value), that will cause root->rev to be "newer than youngest".
> >
> >I think the 'verify' code already has to deal with this possibility,
> >since 'recover' can backdate 'current' in the following situation:
> >.
> > 1. svn_fs.h consumer opens an svn_fs_root_t for r42
> > 2. invisible monkeys delete db/revs/0/42 and db/current
> > 3. admin runs 'svnadmin recover', which regenerates 'current' as 41
> > 4. svn_fs.h consumer calls verify() on the root it had opened earlier
>
> I don't buy this. verify_root() is not required to successfully verify r42
> in this scenario.

I agree with your last sentence, but I think you misunderstood me.
I was simply arguing that the "root->rev is younger than value in
'current' on disk" is a situation that can happen in normal API usage,
and therefore calling svn_fs_verify_root() in that situation is *not*
undefined behaviour. That is: the call may either succeed or error out,
but may not assert or segfault (or grow the proverbial toaster's arm).

> >Moreover, the 'verify' code is inherently the place where violated
> >invariants are least likely to cause trouble, and it's read-only.
> >Therefore, while a bug might cause a false positive verification error
> >that rejects a commit, I don't see any worse outcome. (If there's
> >a failure mode here that I overlooked, it's most likely to be in the f7
> >code, since I haven't worked much with those parts of fsfs.)
> >
> >All that said, I agree that checking after the _verify_root() call that
> >root->rev and youngest_rev_cache haven't changed would be an improvement.
> [...]
> >
> >>nor using anything that would have been done post-commit.
> >
> >That's a good point: the svn_fs_fs__verify_root() must not add any
> >permanent references to the revision it thinks is youngest — e.g., it
> >must not add reps it traverses to rep-cache.db. That's true today but
> >not necessarily true forever.
>
> Interesting observation. It would be good the verify could use a read-only
> FS instance, but I don't think we have such a mechanism.

Not today, but it would be easy enough to add to fs_fs_data_t a boolean
that tells with_some_lock() to just error out, and to have rep-cache.c
respect that boolean as well. (See also the next bullet)

> I meant the opposite direction: a successful commit does some things
> post-commit (e.g. remove the txn directory, update the fulltext cache,
> update the rep cache) and this verify must not assume any of those things
> have been done already.

Taking these one by one:

- remove the txn directory: this wouldn't happen if the committing
  process is SIGKILLed just after bumping 'current', so 'verify' already
  has to deal with this.

  In theory, there could be a bug the other way here: if verify()
  removed that "orphaned" txn dir, the original commit process — which
  is "on hold" pending verify() returning — might try to rmdir() that
  directory when it resumes, and fail with ENOENT. This specific
  situation would have been caught by 'make check' in maintainer mode,
  of course, but in general, that's another way in which we rely on
  'verify' being a read-only operation.

- update the fulltext cache: the second FS handle already runs with
  empty caches (see the svn_uuid_generate() call).

- update the rep cache: verify_as_revision_before_current_plus_plus() is
  called before rep-cache.db is updated (bumping 'current' is
  a fence separating those two), and in any case, 'verify' already has
  to deal with rep-cache entries being absent in case a revision had
  been committed while enable-rep-caching=false (in fsfs.conf) was in
  effect.

So that's these three. Is there anything else that's updated between
bumping 'current' and returning control to svn_fs_commit_txn()'s caller?
Or some kind of process-global or machine-global cache shared between
the two FS handles, despite the different cache namespaces?

> >I also wonder if having verify_as_revision_before_current_plus_plus()
> >run in a child process would gain anything.
> >
> >>verify_as_revision_before_current_plus_plus() is currently compiled in to
> >>debug builds but not to release builds. We can say therefore it gets
> >>reasonable coverage in test suite runs but has had little or no real-world
> >>testing.
> >
> >Indeed. How about enabling that function in the alpha1 release so we
> >can get some more feedback about it?
>
> I like that idea: it seems like entirely appropriate behaviour for an alpha
> or beta release, and we'd probably get no direct feedback and this would be
> a good sign that it's working ok.

Exactly my thinking.

Cheers,

Daniel
Received on 2017-01-31 12:11:18 CET

This is an archived mail posted to the Subversion Dev mailing list.