[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Tue, 6 Oct 2015 11:38:00 +0200

On Sun, Oct 4, 2015 at 4:28 PM, Evgeny Kotkov <evgeny.kotkov_at_visualsvn.com>
wrote:

> Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com> writes:
>
> > Right now, we are losing information. In rare cases and probably nothing
> too
> > important but still. This aspect makes me consider the current behaviour
> a
> > bug. Whether creating that situation in the first place was o.k. nor not
> is
> > a separate issue.
>
> To my mind, the current behavior is a regression, because:
>
> - svnadmin dump produces different dumps for the same repository, depending
> on the version (1.8 or 1.9)
>

That is not a problem because we don't guarantee identical
dump files between releases. And only since 1.8 (?) do we
even create reproducible dump files for a given release.

> - svnadmin dump and svnrdump dump in 1.9 produce different dumps for the
> same repository
>

Not sure whether that is new to 1.9 but when it is, it needs
to be fixed.

> - After a dump-load sequence with svnadmin from 1.9, the resulting
> repository
> behaves differently from the original in operations like 'svn log'
>

Different "behaviour" is expected: I will most likely see
different transaction IDs, different rep sharing, different
pack status etc.

Ideally, none if these would change the output of client-
side operations, but I would settle for "no information
lost".

> I committed a failing test in r1706428, and I also think that we have quite
> a problem to solve — see below.
>

Thanks for the test case. My patch should fix that. I
agree that the broader scope you are asking for new
feature that will require some work.

> [...]
>
> > I suggest that we apply the patch below. It is more efficient that the
> > original behaviour but ensures that "no-op" changes will be retained.
>
> [...]
>
> > V2 of the patch also covers no-op prop changes to directories.
>
> Although the V2 patch seems to mitigate the problem in one particular case,
> it does not solve it generally. For instance, here is another scenario
> where
> 1.8 and 1.9 binaries produce different dumps, even with the patch applied:
>
> svnmucc cp 1 bar baz put empty-file baz
>
> Node-copyfrom-path: bar
> Text-copy-source-md5: d41d8cd98f00b204e9800998ecf8427e
> Text-copy-source-sha1: da39a3ee5e6b4b0d3255bfef95601890afd80709
> -Text-content-length: 0
> -Text-content-md5: d41d8cd98f00b204e9800998ecf8427e
> -Text-content-sha1: da39a3ee5e6b4b0d3255bfef95601890afd80709
> -Content-length: 0
>

What information is being lost here? That change should show
up as added-with-history. It simply no longer claims to also
change the file contents.

You are basically asking here for a new "T" ("touched") state
alongside the traditional "M". Subversion does not have that
concept formally, yet, and any attempt to mimic it relies heavily
on implementation details of your respective repository backend.

Sadly, some of our logic in the higher layers either tried to exploit
knowledge of the repo's internal workings or simply did not care
to specify whether "touched" is a subset of "modified" or not.
As a result, underspecified APIs leak unspecified behaviour up
to the client.

Introducing "T" consistently will require some work. Most of it
should be moderately easy. Two areas are problematic, though:

* Does a date change in the working copy constitute a "touch"?
  That would break many people's work flows. So, 'svn touch'
  would probably be an explicit sub-command.

* Under what circumstances would a merge produce "T"?
  That leads directly to the question of what the semantics
  of touch would be plus a problem similar to "T" in working
  copies.

> I examined the history of fixing problems caused by r1572363 + r1573111,
> and
> this is the second time when we find ourselves whack-a-moling the calling
> sites
> of svn_fs_props_different() and svn_fs_contents_different(). Previously,
> we
> had issues with misbehaving 'svn blame -g' [1]. This ended up with placing
> a hack in subversion/libsvn_repos/rev_hunt.c, and with rather questionable
> compatibility promises for svn_ra_get_file_revs2():
>
> [[[
> else if (sb->include_merged_revisions
> && strcmp(sb->last_path, path_rev->path))
> {
> /* ### This is a HACK!!!
> * Blame -g, in older clients anyways, relies on getting a
> notification
> * whenever the path changes - even if there was no content change.
> *
> * TODO: A future release should take an extra parameter and
> depending
> * on that either always send a text delta or only send it if there
> * is a difference. */
> contents_changed = TRUE;
> }
>
> ...
>
> * @note Prior to Subversion 1.9, this function may request delta handlers
> * from @a handler even for empty text deltas. Starting with 1.9, the
> * delta handler / baton return arguments passed to @a handler will be
> * #NULL unless there is an actual difference in the file contents between
> * the current and the previous call.
> *
> * @since New in 1.5.
> */
> svn_error_t *
> svn_ra_get_file_revs2(svn_ra_session_t *session,
> const char *path,
> ...
>
> ]]]
>
> The r1572363 + r1573111 patchset implicitly changes behavior of around six
> different callers, and two of them have already backfired with problems.
> Moreover, apart from switching everything to the new API, these revisions
> change the behavior of *existing* API like svn_fs_contents_changed(). We
> have an erratum describing the change [2], but I doubt that API users can
> properly adjust their code, if they notice this change — because even we
> failed
> to do so.
>

'svn blame -g' is a prime example of exploiting unspecified
FS API behaviour. 1.9 does fixes that aspect, while leaving
'svn blame -g' in it's "approximate - at best" state. IOW, we
fixed a layering violation and had to add workarounds for
old clients that relied on it.

> Finally, I cannot understand the practical benefits from doing this. If
> these
> changes are aimed at making the checks more strict, but break existing
> code,
> I don't think we should be doing this. If this is an optimization, I'd
> favor
> correctness over it.
>

This is not about performance, it is about API guarantees
and functional stability. So, yes that is an optimization in
the sense of "making it better than before". And no, we
should not go back to worse unless there is no other
practical way.

> I propose that, instead of patching the callers in response to problems, we
> restore every relevant part to its 1.8.x state, where they're known to be
> stable
> and work.

It is the *callers* who are the problem and the explicit API
of 1.9 simply exposes that fact. Therefore, fixing the callers
is the right thing to do. To get fully behaviour, we need to
decide upon "M" vs. "T", though. IMO, this is beyond the
scope of this thread.

> This would allow us to remove the hacks like the one associated with
> fixing svn blame -g, would restore the proper behavior of svnadmin dump,
> and
> would also allow us not to deal with similar problems in other places, if
> they
> exist.
>
> How does this sound?
>

Well-intended but, in this case, terrible.

-- Stefan^2.
Received on 2015-10-06 11:38:17 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.