On Sat, Aug 09, 2014 at 08:08:21PM +0200, Branko Čibej wrote:
> On 09.08.2014 15:55, Alan Modra wrote:
> > Using current git sources compiled with gcc-4.9.1 or mainline gcc at
> > -O3 results in failure of random-test.
> >
> > On powerpc64le we see
> > PASS: random-test 1: random delta test
> > svn_tests: E200006: mismatch at position xxxxx
> > FAIL: random-test 2: random combine delta test
> >
> > x86_64 gives
> > PASS: random-test 1: random delta test
> > svn_tests: E200006: Test crashed (run in debugger with '--allow-segfaults')
> > FAIL: random-test 2: random combine delta test
> >
> > After some digging, I narrowed the failure down to
> > subversion/libsvn_delta/text_delta.c, and the specific -O3 options
> > causing the failure to -ftree-loop-vectorize -fvect-cost-model=dynamic.
> > At first I thought I'd found a vectorizer bug, but the real problem is
> > a bug in the source, specifically this line in patterning_copy:
> >
> > *(apr_uint32_t *)(target) = *(apr_uint32_t *)(source);
> >
> > Quoting from ISO/IEC 9899:1999
> > 6.3.2.3 Pointers
> > ...
> > 7 A pointer to an object or incomplete type may be converted to a
> > pointer to a different object or incomplete type. If the resulting
> > pointer is not correctly aligned for the pointed-to type, the behavior
> > is undefined.
> >
> > So here we have undefined behaviour if "source" and "target" are not
> > 4-byte aligned.. Fixed as follows.
>
> Nope. This code won't fly because it's not portable across compilers and
> platforms.
In that case, consider my email a bug report. I've done the hard work
debugging and analyzing the problem. Fixing it is trivial, for
instance by using
typedef struct _unaligned { char u[4]; } unaligned32;
in the patch I posted rather than
typedef apr_uint32_t __attribute__ ((aligned (1))) unaligned32;
> The way to fix this is to make sure that the macro
> SVN_UNALIGNED_ACCESS_IS_OK gives the correct answer; and it's OK if
> that answer compiler-specific, not just platform-specific. In other
> words, it's fine if the macro gives a different answer depending on
> GCC vectorizer options.
If it were up to me I'd revert r956593 entirely. Thinking you can
optimise memcpy is just plain wrong-headed, and this is the second bug
found in patterning_copy(). Hmm, actually if you want to optimise
patterning_copy() then something that might make sense is to detect
the overlap case and copy the original repeating sequence multiple
times, rather than copying what has just been written. That is likely
to give better cache performance especially on processors that suffer
from load-hit-store stalls.
--
Alan Modra
Australia Development Lab, IBM
Received on 2014-08-11 01:39:25 CEST