On Tue, Jan 25, 2011 at 4:58 PM, Hyrum K Wright <hyrum_at_hyrumwright.org> wrote:
> Johan (and other interested parties),
> I've been following some of the commits to the
> diff-optimizations-branch with interest. While I've not reviewed them
> for technical merit, it appears that others have, and that there is
> good work going on in the branch.
Thanks for your interest.
It might be good to get a little bit more review on the whole, I
think. A lot of people have read parts of the code, but if I remember
correctly most (if not all) of them have said things like "I haven't
reviewed it in detail, but here are some suggestions, feedback, ...".
> I'm wondering what the potential
> for merging back to trunk is. This is the TODO section from the
> BRANCH-README:
>
> TODO:
>
> * eliminate identical prefix [DONE]
> * eliminate identical suffix [DONE]
> * make diff3 and diff4 use datasources_open [DONE]
> (this may eliminate the need for datasource_open, and the flag
> datasource_opened in token.c#svn_diff__get_tokens)
> * implement (part of) increment_pointers and
> decrement_pointers with a macro [DONE]
> (offers another 10% speedup)
> * integrate (some of the) low-level optimizations for prefix/suffix
> scanning, proposed by stefan2 [2] []
> * revv svn_diff.h#svn_diff_fns_t []
>
> It looks like, for the most part, any destabilizing functionality is
> completed, and what remains are simply optimizations. This
> optimization work can probably be performed on trunk, and if so, we
> should merge the branch back and do the cleanup work there. The only
> bit on the current list of stuff is revving svn_diff_fns_t. Can that
> work be parallelized?
Yes, you are correct. Most of the essential work is done. Further
optimizations can just as well be done on trunk.
Revving svn_diff_fns_t: what do you mean with parallelizing it? I must
admit that I don't really know (yet) how to go about that. Very early
during the branch work, danielsh pointed out that I modified this
public struct (vtable for reading data from datasources), so it should
be revved. Is it listed/documented somewhere what should be done for
that (community guide perhaps)?
(one slightly related note to this: I introduced the function
datasources_open, to open the multiple datasources at once (as opposed
to the original datasource_open, which only opens one datasource). But
maybe the new function should be renamed to datasource_open_all or
datasources_open_all or something, to make it stand out a little bit
more).
> I'm not trying to rush the work, nor do I think it is essential for
> 1.7, but it feels like a pretty good performance increase, and one
> that we shouldn't have any problem shipping. (If there are currently
> API uncertainties, than it would be good to wait until 1.7.x branches,
> though.)
Yes, I think it's quite ready to be merged, and could provide a very
significant performance increase (for diff, merge and blame).
The API is stable now, I think, except maybe for the name of the
datasources_open function (see above). If we decide to go (for
optimizations reasons) for specialized prefix/suffix scanning
functions for 2, 3 or 4 datasources, I think it's best to keep the
generic datasources_open API (with an array of datasources), and only
split up between specialized versions in the implementation.
> Anyway, I just really like the concept / implementation of the branch,
> and I'm excited to get it into the hands of users. Thoughts?
>
> -Hyrum
>
Cheers,
--
Johan
Received on 2011-01-26 03:32:09 CET