Re: My take on the diff-optimizations-bytes branch

From: Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de>
Date: Sat, 08 Jan 2011 06:50:59 +0100

On 03.01.2011 02:14, Johan Corveleyn wrote:
> On Mon, Jan 3, 2011 at 12:13 AM, Johan Corveleyn<jcorvel_at_gmail.com> wrote:
>> On Sun, Jan 2, 2011 at 10:04 PM, Johan Corveleyn<jcorvel_at_gmail.com> wrote:
>>> On Sat, Jan 1, 2011 at 11:25 PM, Stefan Fuhrmann
>>> <stefanfuhrmann_at_alice-dsl.de> wrote:
>>>> Hi Johan,
>>>>
>>>> Thursday night I did something stupid and had a look at how
>>>> svn blame could be made faster based on the HEAD code in
>>>> your branch.
>>>>
>>>> One night and most of the following day later, I think I made it
>>>> a few percent faster now. Some of the code I committed directly
>>>> to /trunk and you need to pull these changes into your branch
>>>> to compile the attached patch.
>>>>
>>>> Feel free to test it, take it apart and morph it any way you like --
>>>> I know the patch isn't very pretty in the first place. I tested the
>>>> patch on x64 LINUX and would like to hear whether it at least
>>>> somewhat improved performance on your system for your
>>>> svn blame config.xml use-case.
>>>>
>>>> -- Stefan^2.
>>>>
>>>> [[[
>>>> Speed-up of datasource_open() and its sub-routines
>>>> by a series of optimizations:
>>>>
>>>> * allocate the file[] array on stack instead of heap
>>>> (all members become directly addressible through
>>>> the stack pointer because all static sub-functions
>>>> will actually be in-lined)
>>>> * minor re-arragements in arithmetic expressions to
>>>> maximize reuse of results (e.g. in INCREMENT_POINTERS)
>>>> * move hot spots to seperate functions and provide a
>>>> specialized version for file_len == 2
>>>> (many loops collapse to a single execution, other
>>>> values can be hard-coded, etc.)
>>>> * use seperate loops to process data within a chunk
>>>> so we don't need to call INCREMENT_POINTERS& friends
>>>> that frequently
>>>> * scan / compare strings in machine-word granularity
>>>> instead of bytes
>>>> ]]]
>>> Hi Stefan,
>>>
>>> Thanks for taking a look. I really appreciate it.
>>>
>>> When I saw your first couple of commits, to the adler32 stuff, I
>>> already thought: hmmm, he's up to something :-). And after I saw your
>>> change to eol.c#svn_eol__find_eol_start, reading per machine-word, I
>>> was thinking: hey, I could really use something like that in the
>>> prefix/suffix scanning. Nice ... :-)
>>>
>>> (I had some trouble applying your patch. It doesn't apply cleanly
>>> anymore to HEAD of the branch (but most hunks were applied correctly,
>>> and I could manually change the rest, so no problem)).
>>>
>>> However, first tests are not so great. In fact, it's about 25% slower
>>> (when blaming my settings.xml file with 2200 revisions, it's spending
>>> about 90 seconds in diff, vs. around 72 with HEAD of
>>> diff-optimizations-bytes).
>>>
>>> Analyzing further, I can see it's spending significantly less time in
>>> prefix/suffix scanning, but more time in token.c#svn_diff__get_tokens
>>> (which is where the original algorithm gets the tokens/lines and
>>> inserts them into the "token tree"). This tells me it's not working
>>> correctly: either prefix/suffix scanning fails too soon, so there's
>>> much more left for the regular algorithm. Or it's just not functioning
>>> correctly.
>>>
>>> Looking at the result of the blame operation, it seems it's the
>>> latter. The final result of the blame is not correct anymore.
>>>
>>> I'll try to look some more into it, to see what's going wrong. Maybe
>>> you can also see it with a simple diff of a large file ... (for a good
>>> example in svn's own codebase, you could try
>>> subversion/tests/cmdline/merge-tests.py (pretty large, and almost 700
>>> revisions)).
>> Ok, by eliminating parts of the new stuff, I found out the problem is
>> in scan_backward_fast (part of suffix scanning). If I bypass that, and
>> always take scan_backward_slow, it works.
>>
>> And it's fast too! It's taking only 58 seconds in "diff", vs. 72 for
>> the normal version. Splitting that up further, it's taking only 28
>> seconds in prefix/suffix scanning, vs. ~47 for the normal version (for
>> some reason, the lcs processing took a little longer, but that's
>> probably unrelated (only did a single run)). And that's with
>> scan_backward_slow. That's pretty amazing.
>>
>> The code of scan_backward_fast is pretty difficult to understand for
>> me. So I'm not sure if I can spot the problem. I'll continue staring
>> at it for a little while ...
> Ok, I can't find it right now. There must be some functional
> difference between scan_backward_fast and scan_backward_slow.
>
> For now, some feedback on the rest of the patch:
>
> [[[
>
>> - DECREMENT_POINTERS(file, file_len, pool);
>> - }
>> + DECREMENT_POINTERS(file, file_len, pool);
>> }
>>
>> + return SVN_NO_ERROR;
>> +}
>> +
>> +/* Find the prefix which is identical between all elements of the FILE array.
>> + * Return the number of prefix lines in PREFIX_LINES. REACHED_ONE_EOF will be
>> + * set to TRUE if one of the FILEs reached its end while scanning prefix,
>> + * i.e. at least one file consisted entirely of prefix. Otherwise,
>> + * REACHED_ONE_EOF is set to FALSE.
>> + *
>> + * After this function is finished, the buffers, chunks, curp's and endp's
>> + * of the FILEs are set to point at the first byte after the prefix. */
>> +static svn_error_t *
>> +find_identical_prefix(svn_boolean_t *reached_one_eof, apr_off_t *prefix_lines,
>> + struct file_info file[], apr_size_t file_len,
>> + apr_pool_t *pool)
>> +{
>> + if (file_len == 2)
>> + SVN_ERR(find_identical_prefix_fast(reached_one_eof, prefix_lines,
>> + file, pool));
>> + else
>> + SVN_ERR(find_identical_prefix_slow(reached_one_eof, prefix_lines,
>> + file, file_len, pool));
>> +
> Small problem here (not really visible in practice, but potential
> change in behavior none the less): if either of the above function
> calls exited early because of reached_all_eof, this function should
> also exit early.
Yep, that is just an artifact caused by moving code
into a sub-function.
> If both (all) files reached their end, they are completely identical,
> and it makes no sense to roll back the last (incomplete) line.
>
> So maybe the check for reached_all_eof should be pulled out of the
> above functions, and into this one? But then you'll also need to move
> the ended_at_nonmatching_newline stuff, so transfer the information of
> had_cr. Maybe you've considered that already ...
>
> The rest looks fine from a correctness standpoint (and tests
> correctly), except for scan_backwards_fast.
>
> Also, it would really help if you could split up this patch (though
> this was certainly a good way to try it out and give me an idea of the
> possibilities).
Giving you some ideas was the main point of it.

If I find some time, I will try to reduce the patch
(no fast / slow separation). But since the heap
to stack conversion spreads across all of the
code, it is hard to split the patch.

> It would be interesting to see where the biggest gains
> are coming from (I'm guessing from the "per-machine-word"
> reading/comparing; I'd like to try that first, maybe together with the
> allocating of the file[] on the stack; I'd like to avoid
> special-casing file_len==2, splitting up functions in *_slow and
> *_fast, because it makes it a lot more difficult imho (so I'd only
> like to do that if it's a significant contributor)). But if you want
> to leave that as an exercise for the reader, that's fine as well :-).
Exercise is certainly not a bad thing ;)

But I think, the stack variable is certainly helpful
and easy to do. While "chunky" operation gives
a *big* boost, it is much more difficult to code if
you need to compare multiple sources. It should
be possible, though.

The great advantage of the file_len==2 case is
that this is the only time-critical operation (a
true merge over long files is rare). Also, many
constructs collapse if the count is fixed to 2:

for (i = 1, is_match = TRUE; i< file_len; i++)
   is_match = is_match&& *file[0].curp == *file[i].curp;
while(is_match)
{
...
   for (i = 1, is_match = TRUE; i< file_len; i++)
     is_match = is_match&& *file[0].curp == *file[i].curp;
}

becomes

while(*file[0].curp == *file[1].curp)
{
}

But basically, we will need to run some more tests.

-- Stefan^2.
Received on 2011-01-08 06:51:34 CET

This message: [ Message body ]
Next message: Ramkumar Ramachandra: "[PATCH] Makefile.svn: Make stock SVN executable configurable"
Previous message: Stefan Fuhrmann: "Re: My take on the diff-optimizations-bytes branch"
In reply to: Johan Corveleyn: "Re: My take on the diff-optimizations-bytes branch"
Next in thread: Johan Corveleyn: "Re: My take on the diff-optimizations-bytes branch"
Reply: Johan Corveleyn: "Re: My take on the diff-optimizations-bytes branch"
Reply: Johan Corveleyn: "Re: My take on the diff-optimizations-bytes branch"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]