On Sat, Jan 8, 2011 at 6:50 AM, Stefan Fuhrmann
<stefanfuhrmann_at_alice-dsl.de> wrote:
> On 03.01.2011 02:14, Johan Corveleyn wrote:
>> It would be interesting to see where the biggest gains
>> are coming from (I'm guessing from the "per-machine-word"
>> reading/comparing; I'd like to try that first, maybe together with the
>> allocating of the file[] on the stack; I'd like to avoid
>> special-casing file_len==2, splitting up functions in *_slow and
>> *_fast, because it makes it a lot more difficult imho (so I'd only
>> like to do that if it's a significant contributor)). But if you want
>> to leave that as an exercise for the reader, that's fine as well :-).
>
> Exercise is certainly not a bad thing ;)
>
> But I think, the stack variable is certainly helpful
> and easy to do. While "chunky" operation gives
> a *big* boost, it is much more difficult to code if
> you need to compare multiple sources. It should
> be possible, though.
Ok, I finished the exercise :-).
- File info structures on stack instead of on heap: committed in
r1060625. Gives 10% boost.
- chunky operation for multiple sources: committed in r1062532. Gives
~250% boost on my machine.
For suffix scanning, I made a couple of changes to the logic you sent
in your patch (as you remember, there was a problem with the
implementation of suffix scanning [1]). It was (in
scan_backward_fast):
[[[
+#if SVN_UNALIGNED_ACCESS_IS_OK
+
+ if ( (file[0].curp - sizeof(apr_size_t) < file[0].curp)
+ && (file[1].curp - sizeof(apr_size_t) < file[1].curp))
+ {
+ do
+ {
+ file[0].curp -= sizeof(apr_size_t);
+ file[1].curp -= sizeof(apr_size_t);
+ }
+ while ( (file[0].curp >= min_curp0)
+ && (file[1].curp >= min_curp1)
+ && ( *(const apr_size_t*)(file[0].curp + 1)
+ == *(const apr_size_t*)(file[1].curp + 1)));
+
+ file[0].curp += sizeof(apr_size_t);
+ file[1].curp += sizeof(apr_size_t);
+ }
]]]
I changed this to (the N-file equivalent of):
[[[
+#if SVN_UNALIGNED_ACCESS_IS_OK
+
+ if ( (file[0].curp - sizeof(apr_size_t) >= min_curp0)
+ && (file[1].curp - sizeof(apr_size_t) >= min_curp1))
+ {
+ do
+ {
+ file[0].curp -= sizeof(apr_size_t);
+ file[1].curp -= sizeof(apr_size_t);
+ }
+ while ( (file[0].curp - sizeof(apr_size_t) >= min_curp0)
+ && (file[1].curp - sizeof(apr_size_t) >= min_curp1)
+ && ( *(const apr_size_t*)(file[0].curp + 1)
+ == *(const apr_size_t*)(file[1].curp + 1)));
+
+ file[0].curp += sizeof(apr_size_t);
+ file[1].curp += sizeof(apr_size_t);
+ }
]]]
(so, changed the if-comparisons before the do-while-loop, to make sure
there is room to subtract a sizeof(apr_size_t) (the original didn't
really make sense to me), and changed the comparisons with min_curpX
in the while condition (check if there is still room to subtract a
sizeof(apr_size_t)).
After that, all tests went fine.
Now, you suggested in a previous post in this thread that hardcoding
file_len to 2 should speed it up with up to 50%. And you are correct
(on my machine, it's sped up by ~40% if I replace all file_len's in
find_identical_* with 2).
So, that leads me to conclude that I should probably throw away the
generic code, and replace it with 3 variants: find_identical_*2,
find_identical_*3 and find_identical_*4 (once I write *2, the rest
should be mostly copy-n-paste). As you said, this should also vastly
simplify the code (drop the for loops, simpler conditionals, ...).
Cheers,
--
Johan
[1] http://svn.haxx.se/dev/archive-2011-01/0022.shtml
Received on 2011-01-23 22:47:16 CET