[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] delta_files() speedup 2/3: keyword substitution

From: Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de>
Date: Tue, 30 Mar 2010 02:26:56 +0200

On Monday 29 March 2010 23:38:04 you wrote:
> > From: Philip Martin [mailto:philip.martin_at_wandisco.com]
> > Julian Foad <julian.foad_at_wandisco.com> writes:
> > >> * subversion/libsvn_subr/subst.c
> > >> (translation_baton): the 'interesting' member is now
> > >> a boolean array.
> > >> (create_translation_baton): adapt initialization code
> > >> (translate_chunk): eliminate call to strchr
> > >>
> > >> patch by stefanfuhrmann < at > alice-dsl.de
> > >> ]]]
> > >
> > > This patch looks lovely, from the point of view of a read-through
> > > review.
> >
> > Agreed.
> >
> > To get rid of the initialization we could use 4 static constant arrays
> > (we could even partially overlap them to save memory), but that's
> > probably not a significant improvement.
>
> I'm not sure how all this compares to just three byte compares, but with a
> only a few kb first level cache in most current x86 processors it might be
> even more optimal to just do the comparison in code.
>
> But I think any solution that avoids calling the locale dependent strchr()
> function will help here and the details between the table and in-code
> variants are probably not measurable.

Since the lookup array is very small (1/4KB), it fits easily into L1.
In fact, the initialization puts it there more efficiently than an
initial load from RAM could ever be. And it should take no more
than 50 clock ticks on contemporary processors, i.e. a typical
L3 latency.

Therefore, using a 3 or 4 pre-calculated tables will increase code
complexity and potentially be slower. That slowdown would hardly
be measurable, though.

Using 3 (additional) compares has two problems. First, it would
result in up to 5 jump operations closely together. That overloads
the branch prediction logic causing pipeline stalls (no more than
3 in any instruction decoder window). More importantly, we had
to decide dynamically, what tests to perform ('$', '\r\n' or both).
That would definitely be slower than the 2..3 cycles per iteration
achieved by the original patch.

-- Stefan^2.
Received on 2010-03-30 02:27:22 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.