Re: My take on the diff-optimizations-bytes branch

From: Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de>
Date: Mon, 03 Jan 2011 12:03:43 +0100

On 03.01.2011 02:14, Johan Corveleyn wrote:
> On Mon, Jan 3, 2011 at 12:13 AM, Johan Corveleyn<jcorvel_at_gmail.com> wrote:
>> On Sun, Jan 2, 2011 at 10:04 PM, Johan Corveleyn<jcorvel_at_gmail.com> wrote:
>>> On Sat, Jan 1, 2011 at 11:25 PM, Stefan Fuhrmann
>>> <stefanfuhrmann_at_alice-dsl.de> wrote:
>>>> Hi Johan,
>>>>
>>>> Thursday night I did something stupid and had a look at how
>>>> svn blame could be made faster based on the HEAD code in
>>>> your branch.
>>>>
>>>> One night and most of the following day later, I think I made it
>>>> a few percent faster now. Some of the code I committed directly
>>>> to /trunk and you need to pull these changes into your branch
>>>> to compile the attached patch.
>>>>
>>>> Feel free to test it, take it apart and morph it any way you like --
>>>> I know the patch isn't very pretty in the first place. I tested the
>>>> patch on x64 LINUX and would like to hear whether it at least
>>>> somewhat improved performance on your system for your
>>>> svn blame config.xml use-case.
>>>>
>>>> -- Stefan^2.
>>>>
>>>> [[[
>>>> Speed-up of datasource_open() and its sub-routines
>>>> by a series of optimizations:
>>>>
>>>> * allocate the file[] array on stack instead of heap
>>>> (all members become directly addressible through
>>>> the stack pointer because all static sub-functions
>>>> will actually be in-lined)
>>>> * minor re-arragements in arithmetic expressions to
>>>> maximize reuse of results (e.g. in INCREMENT_POINTERS)
>>>> * move hot spots to seperate functions and provide a
>>>> specialized version for file_len == 2
>>>> (many loops collapse to a single execution, other
>>>> values can be hard-coded, etc.)
>>>> * use seperate loops to process data within a chunk
>>>> so we don't need to call INCREMENT_POINTERS& friends
>>>> that frequently
>>>> * scan / compare strings in machine-word granularity
>>>> instead of bytes
>>>> ]]]
>>> Hi Stefan,
>>>
>>> Thanks for taking a look. I really appreciate it.
>>>
>>> When I saw your first couple of commits, to the adler32 stuff, I
>>> already thought: hmmm, he's up to something :-). And after I saw your
>>> change to eol.c#svn_eol__find_eol_start, reading per machine-word, I
>>> was thinking: hey, I could really use something like that in the
>>> prefix/suffix scanning. Nice ... :-)
>>>
>>> (I had some trouble applying your patch. It doesn't apply cleanly
>>> anymore to HEAD of the branch (but most hunks were applied correctly,
>>> and I could manually change the rest, so no problem)).
>>>
>>> However, first tests are not so great. In fact, it's about 25% slower
>>> (when blaming my settings.xml file with 2200 revisions, it's spending
>>> about 90 seconds in diff, vs. around 72 with HEAD of
>>> diff-optimizations-bytes).
>>>
>>> Analyzing further, I can see it's spending significantly less time in
>>> prefix/suffix scanning, but more time in token.c#svn_diff__get_tokens
>>> (which is where the original algorithm gets the tokens/lines and
>>> inserts them into the "token tree"). This tells me it's not working
>>> correctly: either prefix/suffix scanning fails too soon, so there's
>>> much more left for the regular algorithm. Or it's just not functioning
>>> correctly.
>>>
>>> Looking at the result of the blame operation, it seems it's the
>>> latter. The final result of the blame is not correct anymore.
>>>
>>> I'll try to look some more into it, to see what's going wrong. Maybe
>>> you can also see it with a simple diff of a large file ... (for a good
>>> example in svn's own codebase, you could try
>>> subversion/tests/cmdline/merge-tests.py (pretty large, and almost 700
>>> revisions)).
>> Ok, by eliminating parts of the new stuff, I found out the problem is
>> in scan_backward_fast (part of suffix scanning). If I bypass that, and
>> always take scan_backward_slow, it works.
>>
>> And it's fast too! It's taking only 58 seconds in "diff", vs. 72 for
>> the normal version. Splitting that up further, it's taking only 28
>> seconds in prefix/suffix scanning, vs. ~47 for the normal version (for
>> some reason, the lcs processing took a little longer, but that's
>> probably unrelated (only did a single run)). And that's with
>> scan_backward_slow. That's pretty amazing.
On x64, datasource_open() got about 10x faster
but my test-case was the TSVN changelog where
we (mainly) insert at the beginning instead of
close to end of the file. So, one broken scan
direction seems completely plausible ;)
>> The code of scan_backward_fast is pretty difficult to understand for
>> me. So I'm not sure if I can spot the problem. I'll continue staring
>> at it for a little while ...
> Ok, I can't find it right now. There must be some functional
> difference between scan_backward_fast and scan_backward_slow.
One way to get closer to the problem:

* create a backup of the current rename scan_backward_fast code
* copy scan_backward_slow to scan_backward_fast
* replace all file_len with a hard-coded "2"
-> can give a 50% boost depending on optimizer cleverness
* step-by-step apply changes from the old scan_backward_fast code

> For now, some feedback on the rest of the patch:
>
> [[[
>> Index: subversion/libsvn_diff/diff_file.c
>> ===================================================================
>> --- subversion/libsvn_diff/diff_file.c (revision 1054044)
>> +++ subversion/libsvn_diff/diff_file.c (working copy)
> < ... snip ...>
>> @@ -363,53 +364,152 @@
>> /* Check whether one of the FILEs has its pointers at EOF (this is the case if
>> * one of them has curp == endp (this can only happen at the last chunk)) */
>> static svn_boolean_t
>> -is_one_at_eof(struct file_info *file[], apr_size_t file_len)
>> +is_one_at_eof(struct file_info file[], apr_size_t file_len)
>> {
>> apr_size_t i;
>>
>> for (i = 0; i< file_len; i++)
>> - if (file[i]->curp == file[i]->endp)
>> + if (file[i].curp == file[i].endp)
>> return TRUE;
>>
>> return FALSE;
>> }
>>
>> -/* Find the prefix which is identical between all elements of the FILE array.
>> - * Return the number of prefix lines in PREFIX_LINES. REACHED_ONE_EOF will be
>> - * set to TRUE if one of the FILEs reached its end while scanning prefix,
>> - * i.e. at least one file consisted entirely of prefix. Otherwise,
>> - * REACHED_ONE_EOF is set to FALSE.
>> - *
>> - * After this function is finished, the buffers, chunks, curp's and endp's
>> - * of the FILEs are set to point at the first byte after the prefix. */
>> +/* Quickly determine whether there is a new-line char in CHUNK.
>> + * (mainly copy-n-paste from find_eol_start).
>> + */
>> +#if APR_SIZEOF_VOIDP == 8
>> +# define LOWER_7BITS_SET 0x7f7f7f7f7f7f7f7f
>> +# define BIT_7_SET 0x8080808080808080
>> +# define R_MASK 0x0a0a0a0a0a0a0a0a
>> +# define N_MASK 0x0d0d0d0d0d0d0d0d
>> +#else
>> +# define LOWER_7BITS_SET 0x7f7f7f7f
>> +# define BIT_7_SET 0x80808080
>> +# define R_MASK 0x0a0a0a0a
>> +# define N_MASK 0x0d0d0d0d
>> +#endif
>> +
>> +static svn_boolean_t contains_nl(apr_size_t chunk)
>> +{
>> + apr_size_t r_test = chunk ^ R_MASK;
>> + apr_size_t n_test = chunk ^ N_MASK;
>> +
>> + r_test |= (r_test& LOWER_7BITS_SET) + LOWER_7BITS_SET;
>> + n_test |= (n_test& LOWER_7BITS_SET) + LOWER_7BITS_SET;
>> +
>> + return (r_test& n_test& BIT_7_SET) != BIT_7_SET;
>> +}
>> +
>> +/* Optimized variant of the guts of find_identical_prefix for
>> + * file_len == 2.
>> + */
>> static svn_error_t *
>> -find_identical_prefix(svn_boolean_t *reached_one_eof, apr_off_t *prefix_lines,
>> - struct file_info *file[], apr_size_t file_len,
>> - apr_pool_t *pool)
>> +find_identical_prefix_fast(svn_boolean_t *reached_one_eof,
>> + apr_off_t *prefix_lines,
>> + struct file_info file[],
>> + apr_pool_t *pool)
>> {
>> svn_boolean_t had_cr = FALSE;
>> + apr_off_t lines = 0;
>> + apr_size_t i = 0;
>> +
>> + *reached_one_eof = FALSE;
>> +
>> + while (*file[0].curp == *file[1].curp)
>> + {
>> + if (*file[0].curp == '\r')
>> + {
>> + lines++;
>> + had_cr = TRUE;
>> + }
>> + else if (*file[0].curp == '\n'&& !had_cr)
>> + {
>> + lines++;
>> + }
>> + else
>> + {
>> + had_cr = FALSE;
>> + }
>> +
>> + INCREMENT_POINTERS(file, 2, pool);
>> +
>> +#if SVN_UNALIGNED_ACCESS_IS_OK
>> +
>> + /* Skip quickly over the stuff between EOLs. */
>> +
>> + while ( (file[0].curp + sizeof(apr_size_t)< file[0].endp)
>> +&& (file[1].curp + sizeof(apr_size_t)< file[1].endp)
>> +&& ( *(const apr_size_t *)file[0].curp
>> + == *(const apr_size_t *)file[1].curp)
>> +&& !contains_nl(*(const apr_size_t *)file[0].curp))
>> + {
>> + file[0].curp += sizeof(apr_size_t);
>> + file[1].curp += sizeof(apr_size_t);
>> + }
>> +
>> +#endif
>> +
>> + /* curp == endp indicates EOF (this can only happen with last chunk) */
>> + if (is_one_at_eof(file, 2))
>> + {
>> + *reached_one_eof = TRUE;
>> + break;
>> + }
>> + }
>> +
>> + *prefix_lines = lines;
>> +
>> + /* If all files reached their end (i.e. are fully identical), we're done */
>> + if (file[0].curp == file[0].endp&& file[1].curp == file[1].endp)
>> + return SVN_NO_ERROR;
>> +
>> + if (had_cr)
>> + {
>> + /* Check if we ended in the middle of a \r\n for one file, but \r for
>> + another. If so, back up one byte, so the next loop will back up
>> + the entire line. Also decrement *prefix_lines, since we counted one
>> + too many for the \r. */
>> + if ((*file[0].curp == '\n') || (*file[1].curp == '\n'))
> Here (or below DECREMENT, but within this same if condition) you need:
> (*prefix_lines)--;
>
> just like in the original (and *_slow) case. As is explained in the
> comment above: if we are here, we actually counted a full prefix line
> too much, because we already counted one for a matching '\r', but the
> character after that was '\n' in one file, but a different character
> in the other. So the eol-termination is different, so this is a
> non-matching line which needs to be fully rolled back (hence also the
> extra DECREMENT here).
>
I'm not even sure the original code is correct at all.
For instance, it does not handle '\r' (Mac style) EOLs
properly. Not really sure how to fix that.
>> + DECREMENT_POINTERS(file, 2, pool);
>> + }
>> +
>> + return SVN_NO_ERROR;
>> +}
>> +
>> +/* Non-optimized variant of the guts of find_identical_prefix for
>> + * file_len != 2.
>> + */
>> +static svn_error_t *
>> +find_identical_prefix_slow(svn_boolean_t *reached_one_eof,
>> + apr_off_t *prefix_lines,
>> + struct file_info file[], apr_size_t file_len,
>> + apr_pool_t *pool)
>> +{
>> + svn_boolean_t had_cr = FALSE;
>> svn_boolean_t is_match, reached_all_eof;
>> apr_size_t i;
>> + apr_off_t lines = 0;
>>
>> *prefix_lines = 0;
>> for (i = 1, is_match = TRUE; i< file_len; i++)
>> - is_match = is_match&& *file[0]->curp == *file[i]->curp;
>> + is_match = is_match&& *file[0].curp == *file[i].curp;
>> +
>> while (is_match)
>> {
>> /* ### TODO: see if we can take advantage of
>> diff options like ignore_eol_style or ignore_space. */
>> /* check for eol, and count */
>> - if (*file[0]->curp == '\r')
>> + if (*file[0].curp == '\r')
>> {
>> - (*prefix_lines)++;
>> + lines++;
>> had_cr = TRUE;
>> }
>> - else if (*file[0]->curp == '\n'&& !had_cr)
>> + else if (*file[0].curp == '\n'&& !had_cr)
>> {
>> - (*prefix_lines)++;
>> - had_cr = FALSE;
>> + lines++;
>> }
>> - else
>> + else
>> {
>> had_cr = FALSE;
>> }
>> @@ -422,12 +522,14 @@
>> break;
>> else
>> for (i = 1, is_match = TRUE; i< file_len; i++)
>> - is_match = is_match&& *file[0]->curp == *file[i]->curp;
>> + is_match = is_match&& *file[0].curp == *file[i].curp;
>> }
>>
>> + *prefix_lines = lines;
>> +
>> /* If all files reached their end (i.e. are fully identical), we're done */
>> for (i = 0, reached_all_eof = TRUE; i< file_len; i++)
>> - reached_all_eof = reached_all_eof&& file[i]->curp == file[i]->endp;
>> + reached_all_eof = reached_all_eof&& file[i].curp == file[i].endp;
>> if (reached_all_eof)
>> return SVN_NO_ERROR;
>>
>> @@ -440,20 +542,40 @@
>> svn_boolean_t ended_at_nonmatching_newline = FALSE;
>> for (i = 0; i< file_len; i++)
>> ended_at_nonmatching_newline = ended_at_nonmatching_newline
>> - || *file[i]->curp == '\n';
>> + || *file[i].curp == '\n';
>> if (ended_at_nonmatching_newline)
>> - {
>> - (*prefix_lines)--;
> Same as above: this (*prefix_lines)-- needs to stay, I think.
>
>> - DECREMENT_POINTERS(file, file_len, pool);
>> - }
>> + DECREMENT_POINTERS(file, file_len, pool);
>> }
>>
>> + return SVN_NO_ERROR;
>> +}
>> +
>> +/* Find the prefix which is identical between all elements of the FILE array.
>> + * Return the number of prefix lines in PREFIX_LINES. REACHED_ONE_EOF will be
>> + * set to TRUE if one of the FILEs reached its end while scanning prefix,
>> + * i.e. at least one file consisted entirely of prefix. Otherwise,
>> + * REACHED_ONE_EOF is set to FALSE.
>> + *
>> + * After this function is finished, the buffers, chunks, curp's and endp's
>> + * of the FILEs are set to point at the first byte after the prefix. */
>> +static svn_error_t *
>> +find_identical_prefix(svn_boolean_t *reached_one_eof, apr_off_t *prefix_lines,
>> + struct file_info file[], apr_size_t file_len,
>> + apr_pool_t *pool)
>> +{
>> + if (file_len == 2)
>> + SVN_ERR(find_identical_prefix_fast(reached_one_eof, prefix_lines,
>> + file, pool));
>> + else
>> + SVN_ERR(find_identical_prefix_slow(reached_one_eof, prefix_lines,
>> + file, file_len, pool));
>> +
> Small problem here (not really visible in practice, but potential
> change in behavior none the less): if either of the above function
> calls exited early because of reached_all_eof, this function should
> also exit early.
I'll have a look at that tonight.
> If both (all) files reached their end, they are completely identical,
> and it makes no sense to roll back the last (incomplete) line.
>
> So maybe the check for reached_all_eof should be pulled out of the
> above functions, and into this one? But then you'll also need to move
> the ended_at_nonmatching_newline stuff, so transfer the information of
> had_cr. Maybe you've considered that already ...
>
> The rest looks fine from a correctness standpoint (and tests
> correctly), except for scan_backwards_fast.
>
> Also, it would really help if you could split up this patch (though
> this was certainly a good way to try it out and give me an idea of the
> possibilities). It would be interesting to see where the biggest gains
> are coming from (I'm guessing from the "per-machine-word"
> reading/comparing; I'd like to try that first, maybe together with the
> allocating of the file[] on the stack; I'd like to avoid
> special-casing file_len==2, splitting up functions in *_slow and
> *_fast, because it makes it a lot more difficult imho (so I'd only
> like to do that if it's a significant contributor)). But if you want
> to leave that as an exercise for the reader, that's fine as well :-).
>
> Thanks.
>
> Cheers,
> Johan

-- Stefan^2.
Received on 2011-01-03 12:18:07 CET

This message: [ Message body ]
Next message: Johan Corveleyn: "Re: My take on the diff-optimizations-bytes branch"
Previous message: Kamesh Jayachandran: "Re: [PATCH] tweaked svnrdump to handle junk values with "--version""
In reply to: Johan Corveleyn: "Re: My take on the diff-optimizations-bytes branch"
Next in thread: Johan Corveleyn: "Re: My take on the diff-optimizations-bytes branch"
Reply: Johan Corveleyn: "Re: My take on the diff-optimizations-bytes branch"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]