On Fri, Jan 28, 2011 at 6:49 AM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
> On Fri, Jan 28, 2011 at 5:51 AM, Hyrum K Wright <hyrum_at_hyrumwright.org> wrote:
>> On Tue, Jan 25, 2011 at 8:31 PM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
>>> On Tue, Jan 25, 2011 at 4:58 PM, Hyrum K Wright <hyrum_at_hyrumwright.org> wrote:
>>>> Johan (and other interested parties),
>>>> I've been following some of the commits to the
>>>> diff-optimizations-branch with interest. While I've not reviewed them
>>>> for technical merit, it appears that others have, and that there is
>>>> good work going on in the branch.
>>>
>>> Thanks for your interest.
>>>
>>> It might be good to get a little bit more review on the whole, I
>>> think. A lot of people have read parts of the code, but if I remember
>>> correctly most (if not all) of them have said things like "I haven't
>>> reviewed it in detail, but here are some suggestions, feedback, ...".
>>>
>>>> I'm wondering what the potential
>>>> for merging back to trunk is. This is the TODO section from the
>>>> BRANCH-README:
>>>>
>>>> TODO:
>>>>
>>>> * eliminate identical prefix [DONE]
>>>> * eliminate identical suffix [DONE]
>>>> * make diff3 and diff4 use datasources_open [DONE]
>>>> (this may eliminate the need for datasource_open, and the flag
>>>> datasource_opened in token.c#svn_diff__get_tokens)
>>>> * implement (part of) increment_pointers and
>>>> decrement_pointers with a macro [DONE]
>>>> (offers another 10% speedup)
>>>> * integrate (some of the) low-level optimizations for prefix/suffix
>>>> scanning, proposed by stefan2 [2] []
>>>> * revv svn_diff.h#svn_diff_fns_t []
>>>>
>>>> It looks like, for the most part, any destabilizing functionality is
>>>> completed, and what remains are simply optimizations. This
>>>> optimization work can probably be performed on trunk, and if so, we
>>>> should merge the branch back and do the cleanup work there. The only
>>>> bit on the current list of stuff is revving svn_diff_fns_t. Can that
>>>> work be parallelized?
>>>
>>> Yes, you are correct. Most of the essential work is done. Further
>>> optimizations can just as well be done on trunk.
>>>
>>> Revving svn_diff_fns_t: what do you mean with parallelizing it? I must
>>> admit that I don't really know (yet) how to go about that. Very early
>>> during the branch work, danielsh pointed out that I modified this
>>> public struct (vtable for reading data from datasources), so it should
>>> be revved. Is it listed/documented somewhere what should be done for
>>> that (community guide perhaps)?
>>>
>>> (one slightly related note to this: I introduced the function
>>> datasources_open, to open the multiple datasources at once (as opposed
>>> to the original datasource_open, which only opens one datasource). But
>>> maybe the new function should be renamed to datasource_open_all or
>>> datasources_open_all or something, to make it stand out a little bit
>>> more).
>>>
>>>> I'm not trying to rush the work, nor do I think it is essential for
>>>> 1.7, but it feels like a pretty good performance increase, and one
>>>> that we shouldn't have any problem shipping. (If there are currently
>>>> API uncertainties, than it would be good to wait until 1.7.x branches,
>>>> though.)
>>>
>>> Yes, I think it's quite ready to be merged, and could provide a very
>>> significant performance increase (for diff, merge and blame).
>>>
>>> The API is stable now, I think, except maybe for the name of the
>>> datasources_open function (see above). If we decide to go (for
>>> optimizations reasons) for specialized prefix/suffix scanning
>>> functions for 2, 3 or 4 datasources, I think it's best to keep the
>>> generic datasources_open API (with an array of datasources), and only
>>> split up between specialized versions in the implementation.
>>
>> The API is now rev'd, and I caught the branch up with trunk in
>> r1064459. So it looks like we're ready to merge!
>>
>> Johan, would you like to do the honors?
>
> Thanks.
>
> I'm not sure: shouldn't we wait for a little bit more review, or can
> that happen after integration on trunk (or reviewing the reintegration
> commit itself or something)? E.g. stefan2 said he was going to take a
> look during his travel time next week.
Sounds good.
> And one more thing came to mind with regards to the new API
> (datasources_open function): currently it only supports up to 4
> datasources, so not an arbitrary number of datasources (the
> implementation in diff_file.c assumes that anyway, because it has to
> use some local array variables, so needs a fixed array length).
I noticed these variables. It seems that we should get rid of the
magic number ('4'), and replace it either with a sizeof() or a macro
from somewhere. Not sure what the best approach is, but arbitrary
numbers tickle my spidey sense.
> I guess that's ok, because existing code in diff_file.c also already
> assumes that (e.g. the svn_diff__file_baton_t struct, containing an
> array of 4 file_info structs). And of course, there are currently no
> know usages of that API with more than 4 datasources (diff, diff3 and
> diff4).
>
> Should this restriction be documented in the docstring of
> datasources_open withing svn_diff.h#svn_diff_fns2_t or something?
Sounds reasonable.
-Hyrum
Received on 2011-01-28 16:20:20 CET