On Tue, Nov 11, 2014 at 8:40 PM, Evgeny Kotkov <evgeny.kotkov_at_visualsvn.com>
wrote:
> Branko Čibej <brane_at_wandisco.com> writes:
>
> >> From the performance point of view there will be no big benefits to
> enable
> >> log addressing for an existing repository, because the existing old part
> >> of the repository will remain to be addressed physically.
> >
> > I disagree with your assessment. Certainly, as long as there are "live"
> > delta chains in the repository that reach all the way into
> > physically-indexed content, there will less performance benefit from
> > logical addressing than in a "pure" FSFSv7. But this state will not
> > persist "forever", certainly not for actively changed content.
>
[working my way up through my TODO stack.]
> I did a small attempt in measuring the performance benefits of the
> mixed-mode
> addressing. Please note that these results are only provided for the
> Windows
> platform and only cover basic operations over two protocols. My tests were
> done under Windows 8.1 Professional (Apache HTTP Server 2.2.29, serf
> 1.3.8),
> a part of the batch file covering the 'file://' protocol is attached.
>
Thanks for running the tests, Evgeny. I accept r1637184 and only like to
comment on a few details in your data - so we won't operate based on
different assumptions.
To manage user's expectations, I added a section to our release notes
that explains when and how format 7 will be useful:
http://subversion.apache.org/docs/release-notes/1.9#format7-comparison
I used the http://tortoisesvn.googlecode.com/svn/ repository (25851
> revisions)
> in my experiments. What I did was building 1.9.0 binaries from r1637183
> and
> r1637184. Right after that I started examining, what would have happened
> with
> the performance in three different scenarios:
>
> - The whole repository (25851 revisions) received an upgrade to FSFS7, and
> all
> revisions are physically addressed, i.e. no mixed-mode addressing
> happened.
> This is the default upgrade behavior as of r1637184.
>
My mental model of "mixed addressing" is that that a certain x percentage
of the request is being carried out on new format data and the remainder
on the old part. That should result in a linear combination of "old" and
"new"
speed: mixed = x*new + (1-x)*old.
With an addition run on a completely new format repository, we would be
able to estimate the "x" portion of a request that benefits from the new
format:
x = (old - mixed) / (old - new). Because revision size and content changes
over time as the project matures, this is only an estimate.
- The repository received an upgrade to FSFS7 with mixed-mode addressing and
> has been accumulating new logically addressed revisions for one year. A
> corresponding revision span is the following:
>
> (r24752, 9/10/2013 → r25851, 9/10/2014)
>
The new addressing scheme will only be used from the next shard on,
i.e. r25000 and it will speed things up only when being packed. Since
there is no full shard, yet, we won't see an improvement.
> - The repository received an upgrade to FSFS7 with mixed-mode addressing
> and
> has been accumulating new logically addressed revisions for three
> years. A
> corresponding revision span is the following:
>
> (r21959, 9/10/2011 → r25851, 9/10/2014)
>
This gives 3 reordered packed shards out of 25. We should expect
10..15% speedup in "svn log -v", reading all revs, and possibly more
for export / checkout, concentrating on later revisions.
> In one year (r24752, 9/10/2013 → r25851, 9/10/2014), the performance boost
> from using the mixed addressing mode would be the following:
>
> (http://)
>
> svn-bench null-log unpacked 15.765 → 15.682 s (0.5 % gain)
> svn-bench null-log packed 16.811 → 16.400 s (2.4 % gain)
> svn-bench null-log -v unpacked 16.236 → 16.130 s (0.7 % gain)
> svn-bench null-log -v packed 17.166 → 16.921 s (1.4 % gain)
>
I assume you ran all tests from hot OS caches. If that is the case,
we won't see the differences in I/O. But at least your numbers show
that there is no significant difference in CPU load.
Apart from that, two effects are visible here. Authz implies '-v' on the
request side and packed revprops are slower that non-packed ones.
The latter has recently been fixed by faster parsers and smaller pack
size defaults.
> svn-bench null-export unpacked 43.808 → 43.644 s (0.4 % gain)
> svn-bench null-export packed 43.010 → 43.039 s (0.1 % loss)
>
Despite running multiple requests in parallel, this is much slower than
file:// access. The reason is that for every node, the mod_dav_svn
access pattern requires a full DAG walk starting at some "random"
revision.
There is an experimental patch somewhere in backlog that effectively
eliminates this overhead. I'll polish it and post it to the dev@ list -
maybe it's something we want to fix in 1.9.
(file://)
>
> svn-bench null-log unpacked 3.303 → 3.276 s (0.8 % gain)
> svn-bench null-log packed 5.902 → 5.947 s (0.8 % gain)
>
The parser overhead is very visible here.
> svn-bench null-log -v unpacked 12.530 → 12.688 s (1.3 % loss)
> svn-bench null-log -v packed 13.514 → 13.545 s (0.2 % gain)
> svn-bench null-export unpacked 12.362 → 12.434 s (0.6 % gain)
> svn-bench null-export packed 12.316 → 12.170 s (1.2 % gain)
>
> In three years (r21959, 9/10/2011 → r25851, 9/10/2014), the performance
> boost
> from using the mixed addressing mode would be the following:
>
> (http://)
>
> svn-bench null-log unpacked 15.765 → 15.313 s (2.9 % gain)
> svn-bench null-log packed 16.811 → 16.193 s (3.7 % gain)
> svn-bench null-log -v unpacked 16.236 → 15.596 s (3.9 % gain)
> svn-bench null-log -v packed 17.166 → 16.648 s (3.0 % gain)
> svn-bench null-export unpacked 43.808 → 43.930 s (0.3 % loss)
> svn-bench null-export packed 43.010 → 43.169 s (0.4 % loss)
>
> (file://)
>
> svn-bench null-log unpacked 3.303 → 3.413 s (3.3 % loss)
> svn-bench null-log packed 5.902 → 5.942 s (0.7 % loss)
> svn-bench null-log -v unpacked 12.530 → 12.458 s (0.6 % gain)
> svn-bench null-log -v packed 13.514 → 13.164 s (2.6 % gain)
> svn-bench null-export unpacked 12.362 → 12.945 s (4.7 % loss)
> svn-bench null-export packed 12.316 → 12.537 s (1.8 % loss)
>
For unpacked data, we won't expect much of a difference.
Running the tests from entirely cold OS caches, I get about
15% faster null-exports with an "x" of 45% (ra_local on SSD).
Surprisingly, null-log -v is 25% faster with an "x" of 30%.
The reason is that later revisions happen to be 3x a large
as older ones, so reordering data saves much more I/O
in later pack files that it does for old ones. Hence, we see
twice the expected impact.
> I do not want to make any conclusions on this topic. However, my results
> do
> not show any obvious advantage of having the mixed-mode addressing enabled
> for
> the sample (http://tortoisesvn.googlecode.com/svn/) repository. Even
> after
> *three* years of logically addressed revisions landing into the repository,
> the performance gains still fluctuate around zero.
>
It is important to understand that there is no major structural
difference between phys. and log. addressed repositories.
Both have the same item granularity and do the same pointer
chasing when reconstructing the contents. Differences are
having a simple manifest in separate file vs. having a more
complex index structure in the same file as the actual rev data.
Overall, the same CPU load.
What is expensive, however, is deflate and checksumming
on the CPU side and turning the pointer chasing into a random
access orgy on the I/O side. The first issue is addressed by
the fulltext caches. Once in cache, no checksumming etc.
is necessary anymore. This is why they are so much faster
than reading data from OS caches.
The random I/O is much harder to eliminate and log. addressing
in FSFS gets it down by only 50% or so due to the rev shard
granularity. FSX will hopefully and eventually do a much better
job there.
-- Stefan^2.
Received on 2014-11-17 20:55:29 CET