Johan Corveleyn <jcorvel_at_gmail.com> writes:
> Another question: a shell script might not be good, because not
> portable (and not fast)? Should I use python for this? Maybe the
> "write line by line with a line number in a for loop" would be a lot
> faster in Python? I don't know a lot of python, but it might be a good
> opportunity to learn some ...
A shell script is probably fine. What I want is some data that I can
use on my machine to test your patches.
Here's a crude python script. With the default values it generates two
4.3MB files in less than 2 seconds on my machine. Subversion diff takes
over 10 seconds to compare the files, GNU diff less than one second.
Using --num-prefix=2 makes the script slight slower, since it generates
more random numbers, and the time to run Subversion diff on the output
goes up to 2min. GNU diff still takes a fraction of a second, and with
--minimal the time is 35s. So for big improvements you probably want to
concentrate on shortcut heuristics, rather than low-level optimisation.
#!/usr/bin/python
import random, sys
from optparse import OptionParser
random.seed('abc') # repeatable
def write_file_contents(f, num_lines, num_prefix, num_suffix,
percent_middle, unique):
for i in range(num_lines):
if num_prefix > 1:
prefix = random.randint(1, num_prefix)
else:
prefix = 1
line = str(prefix) + "-common-prefix-" + str(prefix)
middle = random.randint(1, 100)
if middle <= percent_middle:
line += " " + str(12345678 + i) + " "
else:
line += " " + str(9999999999 + i) + unique + " "
if num_suffix > 1:
suffix = random.randint(1, num_suffix)
else:
suffix = 1
line += str(suffix) + "-common-suffix-" + str(suffix)
f.write(line + '\n')
parser = OptionParser('Generate files for diff')
parser.add_option('--num-lines', type=int, default=100000, dest='num_lines',
help='number of lines, default 100000')
parser.add_option('--num-prefix', type=int, default=1, dest='num_prefix',
help='number of distinct prefixes, default 1')
parser.add_option('--num-suffix', type=int, default=1, dest='num_suffix',
help='number of distinct suffixes, default 1')
parser.add_option('--percent-middle', type=int, default=99,
dest='percent_middle',
help='percentage matching middles, default 99')
(options, args) = parser.parse_args(sys.argv)
f1 = open('file1.txt', 'w')
write_file_contents(f1, options.num_lines,
options.num_prefix, options.num_suffix,
options.percent_middle, 'a')
f2 = open('file2.txt', 'w')
write_file_contents(f2, options.num_lines,
options.num_prefix, options.num_suffix,
options.percent_middle, 'b')
--
Philip
Received on 2011-01-05 14:18:14 CET