> -----Original Message-----
> From: Stefan Fuhrmann [mailto:stefanfuhrmann_at_alice-dsl.de]
> Sent: dinsdag 27 april 2010 1:10
> To: Bert Huijben; dev_at_subversion.apache.org
> Subject: Re: [PATCH] Saving a few cycles, part 1/2
>
> Bert Huijben wrote:
> >
> >> -----Original Message-----
> >> From: Stefan Fuhrmann [mailto:stefanfuhrmann_at_alice-dsl.de]
> >> In this patch, I eliminated calls to memcpy for small copies as they
are
> >> particularly expensive in the MS CRT.
> >>
> >
> > Which CRT did you use for these measurements? (2005, 2008, 2010, Debug
> vs
> > Release and DLL vs Static?). Which compiler version? (Standard/Express
or
> > Professional+). (I assume you use the normal Subversion build using .sln
> > files and not the TortoiseSVN scripts? Did you use the shared library
builds
> > or a static build)?
> >
> VSTS2008 Developer Edition. Release build (am I an Amateur?!)
> TSVN build scripts which set /Ox (global opt, intrinsics, omit frame
> pointers, ...)
> > Did you try enabling the intrinsincs for this method instead of using a
> > handcoded copy?
> >
> <mode="eductional prick">
> Yes, but it does not help in this case: memset will use intrinsics
> only for short (<48 bytes on x86) _fixed-size_ buffers. memcpy
> will use intrinsics for _fixed-size_ buffers only, but seemingly with
> no size limit.
But did you try a non-shared library build.
If you use the C runtime as a shared library things like using fastcall
instead of __cdecl or full program optimization don't matter as you don't
change msvcr90.dll (or a later version) in your build. The overhead of
calling a function in a DLL is probably bigger than the thing you are trying
to accomplish by handcoding your memcpy().
In your first mail you said " In this patch, I eliminated calls to memcpy
for small
copies as they are particularly expensive in the MS CRT."
Did you compare it to other toolchains?
And did you compare it to a completely static build without referencing to
msvcr90.dll?
Where these functions on other toolchains compiled into the binary or also
in an external dll?
When comparing 7 byte buffer copies, things like doing an indirect call for
a library function have a much bigger impact than shaving off a few
assembler instructions of the loop itself, so maybe just passing /MT instead
of /MD to the compiler makes the same difference. (It will certainly help on
the full optimization)
Looking at the TortoiseSVN build and my local set of binaries I see that it
uses MSVCR90.DLL from most of its libraries, and it at least uses memcpy()
from its dll in a few code paths.
Bert
Received on 2010-04-27 10:51:17 CEST