[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn_ra_get_file_revs2 vs. blame vs. FS API

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Mon, 17 Feb 2014 12:28:42 +0000 (GMT)

Stefan Fuhrmann wrote:
> r1568600 uncovered an inconsistency in our API usage / interpretation
> making blame -g tests fail for FSX.
>
> The starting point is svn_fs_contents_changed and svn_fs_props_changed.
> FSFS and probably BDB implement those as "has there been an intermittent
> change?" E.g. if the result of some merge yields equal content on both sides,
> they will still be considered "changed". That is a violation of the API definition
> basically asking "is the delta empty?". Even the FSX code is not fully consistent
> with that, yet.

BDB and FSFS and FSX all implement the comparison in exactly the same way: "does it have the same rep-key?". This does not mean "is the content identical?", but it does not necessarily mean "has there been an intervening change to the content?" either. It only means "was the same representation instance chosen to represent this content?". For example, I think it might be possible for this to return "different" even if there has been no intervening change to the content, if the rep cache was disabled and re-enabled during a series of commits, although that depends on even more implementation details.

One thing we can say for certain is that, despite the doc string, these APIs have always been implemented to return "the same" in some cases when the content is known to be the same, and return "different" in cases where the content might be different but might not be.

Some of the callers use the call to optimize whether to fetch and compare the content. That's fine of course.

Some other callers expose this behaviour to higher level APIs, where the caller might be expecting a definite answer to "Does the content differ?", and that's where the problems begin.

Both semantics are valid, of course, so we might end up wanting to provide both.

> svn_ra_get_file_revs2 uses svn_fs_contents_changed to decide whether
> a delta should be sent to the given handler. The false positives returned by
> FSFS and BDB are benign since we simply end up sending an empty delta
> (a mere waste of CPU cycles).

And note that the doc string for svn_ra_get_file_revs2() explicitly says these false positives can be returned.

> However, our blame implementation relies on those false positives because
> it tracks contents from revisions to be merged later separately from "mainline"
> changes. When the merge finally happens but results in the same contents
> as the previous branch revision, the delta handler must be called to pull in
> the blame info from the branch. The latter no longer happens with the strict
> interpretation of svn_fs_contents_changed.

Without following the details, I think what you mean about merges is that the blame algorithm is trying to report an event whenever a merge happened, even if that merge did not result in a content change. The gist of it seems to be that the blame implementation assumes:

  * that a new content rep-key is assigned whenever a merge is committed, even if the content is unchanged, but not at other times when the content is unchanged; and

  * that svn_ra_get_file_revs2() will notify if and only if a new rep-key is assigned.

or  something very much like that. Those assumptions look like the real problem here.

> I see the following options:
>
> * Redefine svn_fs_contents_changed and svn_fs_props_changed to match
>   the current implementation. I'd rather not but that just a -0 from my side.

We should deprecate them, and adjust their doc strings with a historical note about the discrepancy, and replace them with one or two new (pairs of) APIs:

  -- a quick optimization API -- the definitely/maybe question -- like the existing implementations but documented properly and named appropriately; and

  -- (perhaps) a definite content comparison API: "has the content changed?"

> * Make svn_ra_get_file_revs2 always accept a delta handler unless the
>   previous change was on the same branch and the delta is empty (e.g.
>   just a prop change). Sounds arbitrary, so -0 on that too.

I don't understand this one at all. Do you mean "call the delta handler unless..."? What does "same branch" mean in Subversion? Why would a prop change be considered an empty delta?

> * Make svn_ra_get_file_revs2 accept delta handlers (roughly) as today
>   but use the log information for that, i.e. whenever the content modified
>   flag is set for the respective file path. That would be sound approach
>   since we are in rev_hunt.c anyway. The info may already be available
>   (haven't checked yet).

Not sure what you mean there, exactly.

> * Make the blame implementation update the mainline blame info even
>   if no new delta info came in but the previous callback wasn't for the
>   mainline.

Sorry, I can't follow that one either without digging into the blame code.

Clearly the blame implementation needs to be changed, but if we can start by determining what question it needs to know the answer to and define suitable APIs for answering it, that would be a good way to proceed.

- Julian

> I suggest to do any of the last two or even both.
Received on 2014-02-17 13:42:04 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.