Re: [PATCH] Don't use entries for checking status in svn_wc_status3_t. Was: Re: [WIP] Some quirky parts of libsvn_wc/status.c:assemble_status() for retrieving revisions

From: Greg Stein <gstein_at_gmail.com>
Date: Thu, 22 Apr 2010 17:07:56 -0400

On Thu, Apr 22, 2010 at 15:48, Daniel Näslund <daniel_at_longitudo.com> wrote:
> On Thu, Apr 22, 2010 at 02:45:08PM -0400, Greg Stein wrote:
>> On Thu, Apr 22, 2010 at 10:05, Daniel Näslund <daniel_at_longitudo.com> wrote:
>...
>> > * subversion/tests/cmdline/stat_tests.py
>> > (status_with_tree_conflicts): An copied node should not have a
>> > changed_rev or author.
>>
>> Don't the changed_rev and changed_author correspond to the repository
>> node that was copied into the tree? Thus, when I'm looking at a diff,
>> it is effectively relative to what I copied in.
>
> I had my doubts about this one, although I didn't express them here.
> Since a copied or moved node has a previous line of history it makes
> sense to use that history for changed_rev and changed_author.

Right.

> My
> motivation for not setting those for copied/moved nodes was that I
> thought that changed_rev was defined as the closest rev less than or
> equal to the current revision. Since changed_author is closely related
> to changed_rev I set that one to it's nil value too.

changed_rev is "for the path_at_rev you're looking at, when was path last changed?"

so if you have a history like:

1
2 path changed
3
4
5 path changed
6
7

Then changed_rev=0 for path_at_0 and @1. changed_rev=2 for path_at_2..4. and
changed_rev=5 for path_at_5..7.

IOW, if I update my working to rev R, then changed_rev will indicate
when wc/some/path was last changed, and that will always be <= R.

Nodes that are copied from elsewhere will have copyfrom_* and
changed_* information, and changed_rev <= copyfrom_rev.

>> > * subversion/tests/cmdline/svntest/actions.py
>> > (run_and_verify_status): Add new parameter check_entries. We only
>> > check the entries if check_entries is set to True.
>>
>> No way. For the *one* item which needs a different revision... maybe
>> okay. But by eliminating the check_entries, you're also eliminating
>> the checks on all the *other* items in the status output.
>>
>> Instead, I recommend changing svntest/wc.StateItem. Add an "entry_rev"
>> attribute defaulting to None. Where we expect the entry's revision to
>> differ from that reported in a typical status output, then we can set
>> this extra attribute and specifically track the change in behavior.
>
> Ok, I of course prefer this suggestion since it offers the same test
> coverage but allows me to change the behaviour. But I would have _never_
> come up with it on my own.

Yeah, don't worry. I thought it was going to be pretty difficult to
monkey things around until you mentioned tweak_entries_for_compare
later in your message, and then I went "oh yeah. forgot about that.
this should be much easier than I expected" :-P

>...
>> > (svn_wc_status2_t): Add fields revision, changed_rev and
>> > changed_author.
>>
>> If you have changed_rev and changed_author, then please include
>> changed_date. It is weird/inconsistent to have two of the three.
>
> Ok, the reason it was left out was due to me looking in svn/status.c to
> determine what fields were used. I didn't see it so it was not included.
> I took the idea of the status code just returning the most cruciual
> parts too literally. What I should have done was look at what
> performance impact there was to including different fields. If I include
> the others, there won't be much overhead (virtually zero) to including
> the changed_date as well. Will do.

Yup. And they group together "as a unit".

>...
>> And now that you've changed status3, you should alter the
>> svn_wc__status2_from_3() function.
>
> Since the old code uses the fields from the entry and the entry is still
> there, nothing needs to be changed in status2_from_3(), *yet*.

And since you added the fields at the end. But still... this line:

*new_stat = (svn_wc_status2_t *)svn_wc_status_dup(old_stat);

Is kinda fishy. Yes, it still happens to work, but the two structures
are not "identical" any more.

I'm not too worried right now, but what if somebody else makes a
change, and doesn't realize this fact? If a field is removed, then the
2_from3 would break (if you had manual copying). If a field is added,
then it *may* work depending on where it was added. You can keep it
straight, but how aware are the other devs of this sneakiness?

>...
>> > ii) Copies from foreign repos to wc has the rev set to -1
>>
>> Hmm? "foreign repos" only makes sense for merges, I believe. I think
>> you mean "copy from (our) repository to wc".
>
> Nope, check out copy_tests.py:978.

Oh! Hunh. Learn something new every day.

I'm guessing they just turn into normal adds then? And if that's true,
then I wonder where the heck the 1 ever came from. Why is
entry->revision == 1 in that case?

>...
>> read_info() returns changed_* information for copied nodes. A copied
>> node has a corresponding node in the repository, and the changed_*
>> information reflects that. (thus, your assemble_status should not be
>> wiping the values out... if the caller wants to interpret added nodes
>> that way, then it can)
>
> Ok, for the reasons I stated earlier.
>...
>> Personally, I would suggest that assemble_status() does NOTHING more
>> than returning read_info's REVISION result. That is the "ideal" value
>> for the revision of the node LOCAL_ABSPATH.
>>
>> If that revision is NOT what you want (ie. you have some other rules
>> like "well... if it is added, then ..." or "if it is deleted, then
>> ..."), then the caller should perform those calculations explicitly.
>> The status code should not pre-suppose what rules the caller may want
>> for revision handling. It should simply report the "ideal" revision
>> for that node and be done with it.
>
> WHAT? You're saying that all the scanning should take place in the
> client callback?

No. I said "don't create strange semantics".

The code you had in there had the following definition for status->revision:

"The revision of the BASE node, unless that has been deleted, then: if
nothing was added, then still the same BASE revision, but if something
was added, then SVN_INVALID_REVNUM. BUT! If you delete a child of that
addition, then the BASE version MAY be returned again... we say "may"
because if a BASE doesn't correspond to the deleted child of the add,
then you'll get an error instead."

The definition that read_info uses is:

"The revision of the unmodified (BASE) node, or SVN_INVALID_REVNUM if
any (structural) changes have been made to that node."

(where "structural" means an add/copy/move/delete; we don't mean
content edits like propchanges or editing the text)

> That's a major change from what I've done so far. That
> I should just return SVN_INVALID_REVNUM for cases where I've scanned for
> deletion? Is that a behaviour that everyone agrees on? You're saying
> personally so I'm assuming there hasn't been a discussion about it.

No, I don't think it should do a scan_add, nor a scan_deletion. Report
the add/delete, and leave it at that.

repos_relpath, repos_root_url, and repos_uuid are all inherited
values, so I believe a scan_base_repos makes sense: you're simply
returning those values. You're not looking for further details about
operations (which the caller may not be interested in anyways).

(and note that scan_base_repos doesn't even have to be used cuz
assemble_status could be passed the parent repos_* values; if the BASE
node hasn't been switched and (therefore) returns NULL for repos_*,
then its repos_* values are easily derived within wc/status.c; but
repos_* values will always be NULL for modified nodes (ie. if a
WORKING node exists))

Scanning additional rows is not "free", so we shouldn't be looking up
information from the ancestors unless/until we find that most/all of
the callers need that information. And in some cases, we can even make
status.c fill in the info, since it has seen the ancestors already.

In the future, we may return added/copied/moved_here distinctly so
that a scan won't be needed to resolve that (tho you'd still have to
scan to locate the root of the operation, tho we may be able to
optimize that inside wc_db).

Cheers,
-g
Received on 2010-04-22 23:08:24 CEST

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]