On Tue, Jan 25, 2011 at 8:31 PM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
> On Tue, Jan 25, 2011 at 4:58 PM, Hyrum K Wright <hyrum_at_hyrumwright.org> wrote:
>> Johan (and other interested parties),
>> I've been following some of the commits to the
>> diff-optimizations-branch with interest. While I've not reviewed them
>> for technical merit, it appears that others have, and that there is
>> good work going on in the branch.
>
> Thanks for your interest.
>
> It might be good to get a little bit more review on the whole, I
> think. A lot of people have read parts of the code, but if I remember
> correctly most (if not all) of them have said things like "I haven't
> reviewed it in detail, but here are some suggestions, feedback, ...".
>
>> I'm wondering what the potential
>> for merging back to trunk is. This is the TODO section from the
>> BRANCH-README:
>>
>> TODO:
>>
>> * eliminate identical prefix [DONE]
>> * eliminate identical suffix [DONE]
>> * make diff3 and diff4 use datasources_open [DONE]
>> (this may eliminate the need for datasource_open, and the flag
>> datasource_opened in token.c#svn_diff__get_tokens)
>> * implement (part of) increment_pointers and
>> decrement_pointers with a macro [DONE]
>> (offers another 10% speedup)
>> * integrate (some of the) low-level optimizations for prefix/suffix
>> scanning, proposed by stefan2 [2] []
>> * revv svn_diff.h#svn_diff_fns_t []
>>
>> It looks like, for the most part, any destabilizing functionality is
>> completed, and what remains are simply optimizations. This
>> optimization work can probably be performed on trunk, and if so, we
>> should merge the branch back and do the cleanup work there. The only
>> bit on the current list of stuff is revving svn_diff_fns_t. Can that
>> work be parallelized?
>
> Yes, you are correct. Most of the essential work is done. Further
> optimizations can just as well be done on trunk.
>
> Revving svn_diff_fns_t: what do you mean with parallelizing it? I must
> admit that I don't really know (yet) how to go about that. Very early
> during the branch work, danielsh pointed out that I modified this
> public struct (vtable for reading data from datasources), so it should
> be revved. Is it listed/documented somewhere what should be done for
> that (community guide perhaps)?
>
> (one slightly related note to this: I introduced the function
> datasources_open, to open the multiple datasources at once (as opposed
> to the original datasource_open, which only opens one datasource). But
> maybe the new function should be renamed to datasource_open_all or
> datasources_open_all or something, to make it stand out a little bit
> more).
>
>> I'm not trying to rush the work, nor do I think it is essential for
>> 1.7, but it feels like a pretty good performance increase, and one
>> that we shouldn't have any problem shipping. (If there are currently
>> API uncertainties, than it would be good to wait until 1.7.x branches,
>> though.)
>
> Yes, I think it's quite ready to be merged, and could provide a very
> significant performance increase (for diff, merge and blame).
>
> The API is stable now, I think, except maybe for the name of the
> datasources_open function (see above). If we decide to go (for
> optimizations reasons) for specialized prefix/suffix scanning
> functions for 2, 3 or 4 datasources, I think it's best to keep the
> generic datasources_open API (with an array of datasources), and only
> split up between specialized versions in the implementation.
The API is now rev'd, and I caught the branch up with trunk in
r1064459. So it looks like we're ready to merge!
Johan, would you like to do the honors?
-Hyrum
Received on 2011-01-28 05:52:10 CET