Re: some more performance benchmarks

From: Neels Hofmeyr <neels_at_elego.de>
Date: Thu, 07 Apr 2011 16:42:43 +0200

On Thu, 2011-04-07 at 08:03 -0500, Hyrum K Wright wrote:

> This is great work, though as you admit the results are bit
> incomprehensible at this stage. Would it make sense to somehow

Hmm, I think stsp was a bit early to post, I'm still in the sanity
checking phase ;)

E.g.,:

[[[
Timings for 5x5_1.6
N min max avg operation (unit is seconds)
6 330.544 561.406 425.169 TOTAL RUN
]]]

--> variation of 330s to 561s in 6 identical runs?
(later note: I just found the random seeding might be in the wrong
place. I recently refactored a few things and it should prbly go inside
the N loop now.)

I have no idea about other loads on our vm. Maybe we could use the
user/sys output of 'time -p' instead of the real-world time measurements
it is taking right now, although my intention was to stick to the actual
real world time. Of course that works best on an otherwise idle system.
danielsh, or anyone, do you have any idea at what times-of-day (or
something) the VM host is most likely to idle for ~2 hours, so we can
schedule the tests then?

> combine the scripts to prevent the bifurcation of benchmark suites?

For the curious, I attached the current state of my benchmark scripts,
but I'm still editing.

> Do you have the script of commands, and the accompanying repo?

ATM my scripts are in an elego-private repo, they were initially going
to be just a quick test to have real timing results in an article about
1.7.

Actually, I think they have matured enough to be committed in tools/dev.
If Mark does the same we could have a collective look at them and
decide. I haven't actually seen Mark's results nor scripts yet. I have
posted mine before, older versions, a few months ago...
So where to put it?

tools/dev/benchmark/A <--- mark's benches
tools/dev/benchmark/B <--- my benches
???

I'm reluctant to label the dir with my name...
It'd be nice to find two descriptive words to distinguish between the
benchmarks, "by type".

In reply to CMike, an outline of what my test does:

I have a little python magic I use to call svn subcommands, calling the
cmdline svn client from py. Each call is being timed and added to a set
of N calls of the same subcommand.
So it has no understanding of whether it got 'svn delete one/file.c' or
'svn delete huge/tree/*', it simply adds another sample. So that's
almost madness.
But it goes on and runs the same identical sequence a number of times,
currently 6. Their values are averaged, and the min/max are retained to
have some sense of the real-world meaning of the average.

These 6 identical runs are done once for 1.6.x and once for trunk. They
are identical because I use a pseudo random number generator (to get
"organic" modifications) that is seeded with 0 for each run :P

Finally, the averages/mins/maxes for 1.6.x are compared to those of
trunk, showing a factor and a difference in seconds. (A factor of 10
were negligible if the actual time added was only 2 microseconds...)

Because of this "madness" I mentioned above (variation in single calls),
the *averages* are what's really interesting to look at. Even though an
average for 1.6 may be made up of (60 + 0.06)/2, trunk's average could
be (90 + 0.01)/2, and still shows the worst case pretty visibly (a long
call taking even longer). You can stare at the min/max differences to
make up your mind as to whether the averages are actual timings in
seconds or unitless fantasy ratings. Yet I am pretty confident that they
sufficiently compare the two clients.

In stsp's post, every single run was printed out. The really interesting
places to look though are where it says "Results for"... and
"COMPARE"...

[[[
COMPARE 5x5_1.6 to 5x5_trunk
  1.23|+0.45 means factor=1.23, difference in seconds = 0.45
  factor < 1 or difference < 0 means '5x5_trunk' is faster than
'5x5_1.6'
      min max avg operation
  1.151|+49.7471 0.863|-76.9170 1.036|+15.1526 TOTAL RUN
  1.322|+0.0026 0.631|-1.4086 0.938|-0.0019 add
  1.864|+0.0113 0.413|-11.9194 0.513|-3.5088 checkout
  0.625|-2.3169 0.533|-24.8236 0.769|-4.0490 commit
  0.899|-0.0030 0.129|-0.7784 0.122|-0.3825 copy
44.382|+100.3599 16.604|+122.9722 24.856|+109.4408 delete
  1.245|+1.6928 0.943|-1.2191 0.768|-3.3842 merge
  1.258|+0.0023 0.263|-1.8166 1.189|+0.0028 mkdir
  1.256|+0.0022 0.187|-2.3871 0.910|-0.0016 prop mod
  1.237|+0.0021 0.153|-0.4198 0.480|-0.0174 propdel
  1.267|+0.0020 0.267|-0.2795 1.544|+0.0050 proplist
  1.267|+0.0023 0.091|-5.4157 1.018|+0.0003 propset
  1.148|+0.0013 2.347|+0.0133 1.254|+0.0024 resolve
  2.764|+0.5885 0.878|-0.3298 1.565|+0.5413 resolved
  1.410|+0.0733 3.061|+1.9744 2.038|+0.4877 status
  0.602|-3.3563 0.422|-8.9009 0.470|-6.4426 switch
  1.622|+0.0934 0.302|-21.6808 0.421|-3.9173 update
]]]

An interpretation could be: 'svn delete' currently sucks for some reason
(24 times slower???), as well as 'svn status' (twice as slow, but then
again it only added half a second to a run that was >400 seconds long,
on average). The rest looks quite encouraging. IIRC, when I ran this a
few months back I had higher slowness factor numbers than this (and,
notably, a much saner 'svn delete' factor, so there must be some newly
introduced speed bug in trunk's 'svn delete' -- or in my script).
Interesting to note is that the *variation* in time it takes for a run
became somewhat smaller using trunk. So trunk's timings seem to be
slightly more homogeneous than 1.6.x's (looking at the TOTAL min/max).

Now you may note that there are several types of these, i.e. 5x5, 1x100
and 100x1. 5x5 means there is a WC depth of 5 directories, each one
holding 5 subdirs and 5 files. 1x100 is a flat directory with 100 dir
and file entries. 100x1 is a cascade of single subdirs that is 100 dir
levels deep, also with one file on each subdir level.

The 100x1 case is actually where the 1.6 client sucks very badly -- it's
visible in timing but even worse memory wise. Watching `top` while it
runs reveals a memory footprint going towards gigabyte scale.
Consequently, this is where trunk really really ROCKS because it does
the same job with a few mb of RAM, in a fraction of the time.

I'm playing with the thought of making graphical charts of these
numbers, but that might be overkill :)

Any ideas/opinions about adding to tools/dev/ would be appreciated.

Finally, the interesting part that shows what the script tests:

[[[
run_cmd(['svnadmin', 'create', repos])
svn('checkout', file_url, wc)

  trunk = j(wc, 'trunk')
  create_tree(trunk, levels, spread)
  add(trunk)
  st(wc)
  ci(wc)
  up(wc)
  propadd_tree(trunk, 0.5)
  ci(wc)
  up(wc)
  st(wc)

trunk_url = file_url + '/trunk'
branch_url = file_url + '/branch'

svn('copy', '-mm', trunk_url, branch_url)
st(wc)

up(wc)
st(wc)

  svn('checkout', trunk_url, wc2)
  st(wc2)
  modify_tree(wc2, 0.5)
  st(wc2)
  ci(wc2)
  up(wc2)
  up(wc)

  svn('switch', branch_url, wc2)
  modify_tree(wc2, 0.5)
  st(wc2)
  ci(wc2)
  up(wc2)
  up(wc)

  modify_tree(trunk, 0.5)
  st(wc)
  ci(wc)
  up(wc2)
  up(wc)

  svn('merge', '--accept=postpone', trunk_url, wc2)
  st(wc2)
  svn('resolve', '--accept=mine-conflict', wc2)
  st(wc2)
  svn('resolved', '-R', wc2)
  st(wc2)
  ci(wc2)
  up(wc2)
  up(wc)

  svn('merge', '--accept=postpone', '--reintegrate', branch_url, trunk)
  st(wc)
  svn('resolve', '--accept=mine-conflict', wc)
  st(wc)
  svn('resolved', '-R', wc)
  st(wc)
  ci(wc)
  up(wc2)
  up(wc)

  svn('delete', j(wc, 'branch'))
  ci(wc)
  up(wc2)
  up(wc)
]]]

Note that I still have a few plans for tweaks I'm not discussing because
this mail is long enough.

~Neels

text/x-python attachment: benchmark.py

application/x-shellscript attachment: run

Received on 2011-04-07 16:43:47 CEST

This message: [ Message body ]
Next message: C. Michael Pilato: "1.7 Roadmap Items Evaluation"
Previous message: Daniel Shahaf: "Re: XML parsing and checksum mismatch errors at updates and checkouts"
In reply to: Hyrum K Wright: "Re: some more performance benchmarks"
Next in thread: Daniel Shahaf: "Re: some more performance benchmarks"
Reply: Daniel Shahaf: "Re: some more performance benchmarks"
Reply: Daniel Shahaf: "Re: some more performance benchmarks"
Reply: Greg Stein: "Re: some more performance benchmarks"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]