[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] extend svn_subst_translate_string() to record whether re-encoding and/or line ending translation were performed (v. 2)

From: Danny Trebbien <dtrebbien_at_gmail.com>
Date: Mon, 29 Nov 2010 17:25:38 -0800

Attached is a benchmark and Makefile that I used to test the speed of
svn_subst_translate_string() from trunk versus the new
svn_subst_translate_string2().  The program reads a text file named
`2600.txt` in the current working directory and repeatedly calls
svn_subst_translate_string() on the contents.  For `2600.txt`, I used
the plain text version of War and Peace from Project Gutenberg
(http://www.gutenberg.org/ebooks/2600.txt.utf8).

The data that I generated for trunk_at_1040115 were:
trunk_at_1040115 <- c(7780000, 7910000, 7870000, 7660000, 7840000,
7760000, 7620000, 7500000, 7860000, 7800000, 7640000, 7740000,
7760000, 7850000, 8010000, 7800000, 7730000, 7700000, 7900000,
7760000, 7790000, 7970000, 7700000, 7710000, 7990000, 7830000,
7780000, 7810000, 7730000, 7600000)

The data for the "HEAD" sources (commit
6f828b0a4e07d1e14189b9b8c84bd0f884c59164 from my repo;
https://github.com/dtrebbien/subversion/tree/6f828b0a4e07d1e14189b9b8c84bd0f884c59164)
were:
HEAD <- c(8050000, 8230000, 7980000, 8150000, 7950000, 8600000,
8080000, 8420000, 8000000, 8020000, 8420000, 7960000, 8010000,
8200000, 8080000, 8490000, 8190000, 7920000, 7820000, 7780000,
7880000, 8540000, 7970000, 8250000, 8830000, 8540000, 8310000,
8270000, 8010000, 7990000)
Note: This is not "version 3" of the patch. It is essentially
trunk_at_1040115 plus "version 3" plus this changeset:
https://github.com/dtrebbien/subversion/commit/d22329a54dcf58cddc2b618f913597c6defbcb2d

The t-test allows us to conclude with high confidence that the mean
time to run the benchmark with libsvn_subr-1 compiled from
trunk_at_1040115 is less than the mean time to run the benchmark with
libsvn_subr-1 compiled from the HEAD sources:
> t.test(trunk_at_1040115, HEAD, alternative = "less", var.equal = TRUE, conf.level = 0.90)

        Two Sample t-test

data: trunk_at_1040115 and HEAD
t = -7.473, df = 58, p-value = 2.350e-10
alternative hypothesis: true difference in means is less than 0
90 percent confidence interval:
      -Inf -317939.7
sample estimates:
mean of x mean of y
  7780000 8164667

I realized, however, that this is not a fair comparison because the
HEAD sources simply call svn_subst_translate_string2() within
svn_subst_translate_string(), meaning that there is an extra layer of
indirection. After modifying the benchmark to call
svn_subst_translate_string2() directly, I generated these timings:
HEAD_new <- c(7850000, 7890000, 8080000, 7980000, 7820000, 7880000,
7850000, 7540000, 8470000, 8230000, 8410000, 7880000, 7410000,
7490000, 7420000, 7650000, 7430000, 7430000, 7530000, 7720000,
7940000, 7780000, 8070000, 7840000, 7870000, 7970000, 7690000,
7910000, 7860000, 7620000)

Now we cannot reject the null hypothesis that the mean time to run the
benchmark with libsvn_subr-1 compiled from trunk_at_1040115 is greater
than or equal to the mean time to run the modified benchmark with
libsvn_subr-1 compiled from the HEAD sources:
> t.test(trunk_at_1040115, HEAD_new, alternative = "less", var.equal = TRUE, conf.level = 0.90)

        Two Sample t-test

data: trunk_at_1040115 and HEAD_new
t = -0.6839, df = 58, p-value = 0.2484
alternative hypothesis: true difference in means is less than 0
90 percent confidence interval:
     -Inf 33129.55
sample estimates:
mean of x mean of y
  7780000 7817000

One other set of timings that I generated were for the modified
benchmark running with libsvn_subr-1 compiled from the HEAD sources,
slightly modified to set `repair` to TRUE:
HEAD_new_repair <- c(7660000, 7560000, 7570000, 7540000, 7670000,
7790000, 7460000, 7840000, 8060000, 7790000, 8000000, 7830000,
8370000, 8010000, 7730000, 7800000, 7900000, 7730000, 7730000,
7790000, 7750000, 7930000, 7860000, 7810000, 7930000, 7840000,
7890000, 7460000, 7790000, 7730000)

We cannot reject the null hypothesis that the mean time to run the
modified benchmark with libsvn_subr-1 compiled from the HEAD sources
is the same as the mean time to run the modified benchmark with
libsvn_subr-1 compiled from slightly-modified HEAD sources (`repair`
is set to TRUE):
> t.test(HEAD_new, HEAD_new_repair, var.equal = TRUE, conf.level = 0.90)

        Two Sample t-test

data: HEAD_new and HEAD_new_repair
t = 0.3815, df = 58, p-value = 0.7042
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
 -77774.74 123774.74
sample estimates:
mean of x mean of y
  7817000 7794000

> t.test(trunk_at_1040115, HEAD_new_repair, alternative = "less", var.equal = TRUE, conf.level = 0.90)

        Two Sample t-test

data: trunk_at_1040115 and HEAD_new_repair
t = -0.3501, df = 58, p-value = 0.3638
alternative hypothesis: true difference in means is less than 0
90 percent confidence interval:
     -Inf 37836.36
sample estimates:
mean of x mean of y
  7780000 7794000

Therefore, I do not have evidence to support my earlier claim: "3.)
This penalizes repair translations."

My conclusion from all of this is that regardless of the value of
`repair`, my changes do not appear to decrease the performance of
svn_subst_translate_string() as long as svn_subst_translate_string2()
is called directly.

Received on 2010-11-30 02:26:18 CET

This is an archived mail posted to the Subversion Dev mailing list.