> -----Original Message-----
> From: Stefan Fuhrmann [mailto:stefanfuhrmann_at_alice-dsl.de]
> Sent: vrijdag 21 mei 2010 0:48
> To: dev_at_subversion.apache.org
> Subject: [PATCH v3] speed up svn_txdelta_apply_instructions
>
> Hi there,
>
> this is an improved version of the patch posted here:
>
> http://svn.haxx.se/dev/archive-2010-05/0002.shtml
>
> The improvements address the issues listed there:
>
> http://svn.haxx.se/dev/archive-2010-05/0216.shtml
>
> -- Stefan^2.
>
>
> [[[
> svn_txdelta_apply_instructions is relatively slow for long
> instruction sequences copying small pieces of data. This
> seems to be particularly visible in non-packed FSFS
> repositories.
>
> This patch extracts invariants out from the 'for' loop,
> optimizes overlapping copies as well as small data copy
> runtime.
>
> * subversion/libsvn_delta/text_delta.c
> (fast_memcpy, patterning_copy): new functions,
> optimized for our specific workload
> (svn_txdelta_apply_instructions): reduce loop overhead,
> use fast_memcpy and patterning_copy
>
> patch by stefanfuhrmann < at > alice-dsl.de
> ]]]
+/* Unlike memmove() or memcpy(), create repeating patterns when
+ * source and target range overlap. Returns a pointer to the first
+ * byte after the copied target range.
+ */
+static APR_INLINE char*
+patterning_copy(char *target, const char *source, apr_size_t len)
+{
+ const char *end = source + len;
+
+ /* Copy in larger chunks if source and target don't overlap
+ * closer than the size of the chunks (or don't overlap at all).
+ * Use the natural machine word as chunk size
+ * (for some architectures size_t is even a bit larger).
+ */
+ if (end + sizeof(apr_size_t) <= target)
+ for (; source + sizeof (apr_size_t) <= end;
+ source += sizeof (apr_size_t),
+ target += sizeof (apr_size_t))
+ *(apr_size_t*)(target) = *(apr_size_t*)(source);
+
+ /* Copy trailing bytes */
+ for (; source != end; source++)
+ *(target++) = *source;
+
+ return target;
+}
+
patterning_copy() should check the alignment of source and destination or
the copies by using this blocksize can be much slower than the original
version that just used bytes. (On some architectures an unaligned operation
is completely handled in software from within an exception handler)
Bert
Received on 2010-05-21 13:51:08 CEST