[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Status of the branch diff-optimizations-bytes branch

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Fri, 28 Jan 2011 13:49:49 +0100

On Fri, Jan 28, 2011 at 5:51 AM, Hyrum K Wright <hyrum_at_hyrumwright.org> wrote:
> On Tue, Jan 25, 2011 at 8:31 PM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
>> On Tue, Jan 25, 2011 at 4:58 PM, Hyrum K Wright <hyrum_at_hyrumwright.org> wrote:
>>> Johan (and other interested parties),
>>> I've been following some of the commits to the
>>> diff-optimizations-branch with interest.  While I've not reviewed them
>>> for technical merit, it appears that others have, and that there is
>>> good work going on in the branch.
>>
>> Thanks for your interest.
>>
>> It might be good to get a little bit more review on the whole, I
>> think. A lot of people have read parts of the code, but if I remember
>> correctly most (if not all) of them have said things like "I haven't
>> reviewed it in detail, but here are some suggestions, feedback, ...".
>>
>>> I'm wondering what the potential
>>> for merging back to trunk is.  This is the TODO section from the
>>> BRANCH-README:
>>>
>>> TODO:
>>>
>>>  * eliminate identical prefix                 [DONE]
>>>  * eliminate identical suffix                 [DONE]
>>>  * make diff3 and diff4 use datasources_open  [DONE]
>>>     (this may eliminate the need for datasource_open, and the flag
>>>      datasource_opened in token.c#svn_diff__get_tokens)
>>>  * implement (part of) increment_pointers and
>>>    decrement_pointers with a macro            [DONE]
>>>     (offers another 10% speedup)
>>>  * integrate (some of the) low-level optimizations for prefix/suffix
>>>    scanning, proposed by stefan2 [2]          []
>>>  * revv svn_diff.h#svn_diff_fns_t             []
>>>
>>> It looks like, for the most part, any destabilizing functionality is
>>> completed, and what remains are simply optimizations.  This
>>> optimization work can probably be performed on trunk, and if so, we
>>> should merge the branch back and do the cleanup work there.  The only
>>> bit on the current list of stuff is revving svn_diff_fns_t.  Can that
>>> work be parallelized?
>>
>> Yes, you are correct. Most of the essential work is done. Further
>> optimizations can just as well be done on trunk.
>>
>> Revving svn_diff_fns_t: what do you mean with parallelizing it? I must
>> admit that I don't really know (yet) how to go about that. Very early
>> during the branch work, danielsh pointed out that I modified this
>> public struct (vtable for reading data from datasources), so it should
>> be revved. Is it listed/documented somewhere what should be done for
>> that (community guide perhaps)?
>>
>> (one slightly related note to this: I introduced the function
>> datasources_open, to open the multiple datasources at once (as opposed
>> to the original datasource_open, which only opens one datasource). But
>> maybe the new function should be renamed to datasource_open_all or
>> datasources_open_all or something, to make it stand out a little bit
>> more).
>>
>>> I'm not trying to rush the work, nor do I think it is essential for
>>> 1.7, but it feels like a pretty good performance increase, and one
>>> that we shouldn't have any problem shipping.  (If there are currently
>>> API uncertainties, than it would be good to wait until 1.7.x branches,
>>> though.)
>>
>> Yes, I think it's quite ready to be merged, and could provide a very
>> significant performance increase (for diff, merge and blame).
>>
>> The API is stable now, I think, except maybe for the name of the
>> datasources_open function (see above). If we decide to go (for
>> optimizations reasons) for specialized prefix/suffix scanning
>> functions for 2, 3 or 4 datasources, I think it's best to keep the
>> generic datasources_open API (with an array of datasources), and only
>> split up between specialized versions in the implementation.
>
> The API is now rev'd, and I caught the branch up with trunk in
> r1064459.  So it looks like we're ready to merge!
>
> Johan, would you like to do the honors?

Thanks.

I'm not sure: shouldn't we wait for a little bit more review, or can
that happen after integration on trunk (or reviewing the reintegration
commit itself or something)? E.g. stefan2 said he was going to take a
look during his travel time next week.

And one more thing came to mind with regards to the new API
(datasources_open function): currently it only supports up to 4
datasources, so not an arbitrary number of datasources (the
implementation in diff_file.c assumes that anyway, because it has to
use some local array variables, so needs a fixed array length).

I guess that's ok, because existing code in diff_file.c also already
assumes that (e.g. the svn_diff__file_baton_t struct, containing an
array of 4 file_info structs). And of course, there are currently no
know usages of that API with more than 4 datasources (diff, diff3 and
diff4).

Should this restriction be documented in the docstring of
datasources_open withing svn_diff.h#svn_diff_fns2_t or something?

Cheers,

-- 
Johan
Received on 2011-01-28 13:50:48 CET

This is an archived mail posted to the Subversion Dev mailing list.