Re: Parallelizing the python testsuite

From: Lieven Govaerts <svnlgo_at_mobsol.be>
Date: 2006-11-12 21:44:49 CET

Ivan Zhakov wrote:
> On 11/9/06, Lieven Govaerts <svnlgo@mobsol.be> wrote:
>> Quoting Ivan Zhakov <chemodax@gmail.com>:
>>
>> > On 11/9/06, Ivan Zhakov <chemodax@gmail.com> wrote:
>> > > On 11/6/06, Lieven Govaerts <lgo@mobsol.be> wrote:
>> > > >
>> > > > Attached patch is my work in progress towards a parallelized
>> python test
>> > > > suite. It's not completely finished, basically it needs some more
>> > > > testing, some fixes (not all the tests are passing), code
>> cleanup and a
>> > > > log message. But, I send it to the list anyway, so people
>> interested in
>> > > > this patch can already have a look at it .
>> > > Hi Lieven,
>> > > Great work! For present time I'm switching to Mac, so I've tested
>> your
>> > patch:
>> > > [[
>> > > chmbook:~/Subversion/trunk/subversion/tests/cmdline ivan$
>> ./basic_tests.py
>> > -p
>> > > ................................Traceback (most recent call last):
>> > > File "./basic_tests.py", line 1773, in ?
>> > > svntest.main.run_tests(test_list)
>> > > File
>> >
>> "/Users/ivan/Subversion/trunk/subversion/tests/cmdline/svntest/main.py",
>> > > line 996, in run_tests
>> > > exit_code = _internal_run_tests(test_list, testnums, parallel)
>> > > File
>> >
>> "/Users/ivan/Subversion/trunk/subversion/tests/cmdline/svntest/main.py",
>> > > line 868, in _internal_run_tests
>> > > finished_tests.sort(key=lambda test: test.index)
>> > > TypeError: sort() takes no keyword arguments
>> > > ]]
>> > >
>> > > May be it's because it doesn't support Python 2.3?
>>
>> The lambda feature is introduced in 2.4. I have alternative code,
>> I'll change to
>> patch to use that.
>>
>> > Also catch my addition to your patch to implement make check PARALLEL
>> > command to test on non-windows platform.
>> >
>> Yes, thanks for that, I was hoping for someone to add that :)
>>
>> I have to change some things in the sandbox set up code, running
>> basic_tests on
>> ra_dav in parallel doesn't work (see the other mail thread: "[svn] x
>> [fsfs]
>> basic_tests.py failures").
>>
> Yes, I've seen that thread. I think it's good idea to send updated
> your patch with my changes to list.
>
Attached is the new version of the patch. Note that some commits have
been made to trunk in the meantime, which solve some problems with tests
behaving badly when run in parallel;

The new patch allows you to specify the number of tests run in parallel,
it still defaults to 10.

One problem: I've tested this on my Mac buildslave, and it always fails
in some test. Sometimes a child process appears to be hanging, thereby
generating an exception in the parent process. If I don't kill the child
process, it will hang forever (well, a long time atleast). This
exception is shown in the logs *before* I kill the child process:
Exception in thread Thread-47:Traceback (most recent call last):
  File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/threading.py",
line 442, in __bootstrap
    self.run()
  File
"/Users/lgo/slavedir/osx10.4-gcc4.0.1-ia32/build/subversion/tests/cmdline/svntest/main.py",
line 724, in run
    self.result, self.stdout_lines, self.stderr_lines =\
  File
"/Users/lgo/slavedir/osx10.4-gcc4.0.1-ia32/build/subversion/tests/cmdline/svntest/main.py",
line 286, in spawn_process
    pid, wait_code = os.wait()
OSError: [Errno 10] No child processes

Lieven

Index: build/run_tests.py
===================================================================
--- build/run_tests.py (revision 22271)
+++ build/run_tests.py (working copy)
@@ -4,12 +4,13 @@
#

'''usage: python run_tests.py [--url=<base-url>] [--fs-type=<fs-type>]
- [--verbose] [--cleanup] [--enable-sasl]
+ [--verbose] [--cleanup] [--enable-sasl]
+ [--parallel=<# processes>]
                     <abs_srcdir> <abs_builddir>
                     <prog ...>

-The optional base-url, fs-type, verbose, and cleanup options, and
-the first two parameters are passed unchanged to the TestHarness
+The optional base-url, fs-type, verbose, cleanup and parallel options,
+and the first two parameters are passed unchanged to the TestHarness
constructor. All other parameters are names of test programs.
'''

@@ -27,7 +28,7 @@

   def __init__(self, abs_srcdir, abs_builddir, logfile,
                base_url=None, fs_type=None, verbose=None, cleanup=None,
- enable_sasl=None):
+ enable_sasl=None, parallel=None):
     '''Construct a TestHarness instance.

     ABS_SRCDIR and ABS_BUILDDIR are the source and build directories.
@@ -43,6 +44,7 @@
     self.verbose = verbose
     self.cleanup = cleanup
     self.enable_sasl = enable_sasl
+ self.parallel = parallel
     self.log = None

   def run(self, list):
@@ -114,6 +116,8 @@
       cmdline.append('--cleanup')
     if self.fs_type is not None:
       cmdline.append(quote('--fs-type=' + self.fs_type))
+ if self.parallel is not None:
+ cmdline.append('--parallel=' + self.parallel)

     old_cwd = os.getcwd()
     try:
@@ -165,9 +169,9 @@

def main():
   try:
- opts, args = my_getopt(sys.argv[1:], 'u:f:vc',
+ opts, args = my_getopt(sys.argv[1:], 'u:f:vcp:',
                            ['url=', 'fs-type=', 'verbose', 'cleanup',
- 'enable-sasl'])
+ 'enable-sasl', 'parallel='])
   except getopt.GetoptError:
     args = []

@@ -175,8 +179,9 @@
     print __doc__
     sys.exit(2)

- base_url, fs_type, verbose, cleanup, enable_sasl = None, None, None, None, \
- None
+ base_url, fs_type, verbose = None, None, None
+ cleanup, enable_sasl, parallel = None, None, None
+
   for opt, val in opts:
     if opt in ('-u', '--url'):
       base_url = val
@@ -188,12 +193,14 @@
       cleanup = 1
     elif opt in ('--enable-sasl'):
       enable_sasl = 1
+ elif opt in ('-p', '--parallel'):
+ parallel = val
     else:
       raise getopt.GetoptError

   th = TestHarness(args[0], args[1],
                    os.path.abspath('tests.log'),
- base_url, fs_type, verbose, cleanup, enable_sasl)
+ base_url, fs_type, verbose, cleanup, enable_sasl, parallel)

   failed = th.run(args[2:])
   if failed:
Index: Makefile.in
===================================================================
--- Makefile.in (revision 22271)
+++ Makefile.in (working copy)
@@ -353,6 +353,7 @@
# "make check CLEANUP=true" will clean up directories for successful tests.
# "make check TESTS=subversion/tests/cmdline/basic_tests.py"
# will perform only basic tests (likewise for other tests).
+# "make check PARALLEL=10 will run 10 python tests in parallel.
check: $(TEST_DEPS) @BDB_TEST_DEPS@
         @if test "$(PYTHON)" != "none"; then \
           flags="--verbose"; \
@@ -368,6 +369,9 @@
           if test "$(ENABLE_SASL)" != ""; then \
             flags="--enable-sasl $$flags"; \
           fi; \
+ if test "$(PARALLEL)" != ""; then \
+ flags="--parallel $(PARALLEL) $$flags"; \
+ fi; \
           $(PYTHON) $(top_srcdir)/build/run_tests.py $$flags \
                     '$(abs_srcdir)' '$(abs_builddir)' $(TESTS); \
         else \
Index: subversion/tests/cmdline/authz_tests.py
===================================================================
--- subversion/tests/cmdline/authz_tests.py (revision 22271)
+++ subversion/tests/cmdline/authz_tests.py (working copy)
@@ -718,7 +718,7 @@
              ]

if __name__ == '__main__':
- svntest.main.run_tests(test_list)
+ svntest.main.run_tests(test_list, serial_only = True)
   # NOTREACHED

Index: subversion/tests/cmdline/svnsync_tests.py
===================================================================
--- subversion/tests/cmdline/svnsync_tests.py (revision 22271)
+++ subversion/tests/cmdline/svnsync_tests.py (working copy)
@@ -460,7 +460,7 @@
              ]

if __name__ == '__main__':
- svntest.main.run_tests(test_list)
+ svntest.main.run_tests(test_list, serial_only = True)
   # NOTREACHED

Index: subversion/tests/cmdline/svntest/main.py
===================================================================
--- subversion/tests/cmdline/svntest/main.py (revision 22271)
+++ subversion/tests/cmdline/svntest/main.py (working copy)
@@ -24,6 +24,7 @@
import copy # for deepcopy()
import time # for time()
import traceback # for print_exc()
+import threading

import getopt
try:
@@ -134,6 +135,10 @@
# Global variable indicating if svnserve should use Cyrus SASL
enable_sasl = 0

+# Global variable indicating if this is a child process and no cleanup
+# of global directories is needed.
+is_child_process = 0
+
# Global URL to testing area. Default to ra_local, current working dir.
test_area_url = file_scheme_prefix + os.path.abspath(os.getcwd())
if windows == 1:
@@ -245,16 +250,7 @@
                            None, *varargs)

# Run any binary, supplying input text, logging the command line
-def run_command_stdin(command, error_expected, binary_mode=0,
- stdin_lines=None, *varargs):
- """Run COMMAND with VARARGS; input STDIN_LINES (a list of strings
- which should include newline characters) to program via stdin - this
- should not be very large, as if the program outputs more than the OS
- is willing to buffer, this will deadlock, with both Python and
- COMMAND waiting to write to each other for ever.
- Return stdout, stderr as lists of lines.
- If ERROR_EXPECTED is None, any stderr also will be printed."""
-
+def spawn_process(command, binary_mode=0,stdin_lines=None, *varargs):
   args = ''
   for arg in varargs: # build the command string
     arg = str(arg)
@@ -271,7 +267,6 @@
   else:
     mode = 't'

- start = time.time()
   infile, outfile, errfile = os.popen3(command + args, mode)

   if stdin_lines:
@@ -285,6 +280,8 @@
   outfile.close()
   errfile.close()

+ exit_code = 0
+
   if platform_with_os_wait:
     pid, wait_code = os.wait()

@@ -294,7 +291,27 @@
     if exit_signal != 0:
       raise SVNProcessTerminatedBySignal

+ return exit_code, stdout_lines, stderr_lines
+
+def run_command_stdin(command, error_expected, binary_mode=0,
+ stdin_lines=None, *varargs):
+ """Run COMMAND with VARARGS; input STDIN_LINES (a list of strings
+ which should include newline characters) to program via stdin - this
+ should not be very large, as if the program outputs more than the OS
+ is willing to buffer, this will deadlock, with both Python and
+ COMMAND waiting to write to each other for ever.
+ Return stdout, stderr as lists of lines.
+ If ERROR_EXPECTED is None, any stderr also will be printed."""
+
   if verbose_mode:
+ start = time.time()
+
+ exit_code, stdout_lines, stderr_lines = spawn_process(command,
+ binary_mode,
+ stdin_lines,
+ *varargs)
+
+ if verbose_mode:
     stop = time.time()
     print '<TIME = %.6f>' % (stop - start)

@@ -677,7 +694,38 @@
       print "WARNING: cleanup failed, will try again later"
     _deferred_test_paths.append(path)

+class SpawnTest(threading.Thread):
+ def __init__(self, index, tests = None):
+ threading.Thread.__init__(self)
+ self.index = index
+ self.tests = tests
+ self.result = None
+ self.stdout_lines = None
+ self.stderr_lines = None

+ def run(self):
+ command = sys.argv[0]
+
+ args = []
+ args.append(str(self.index))
+ args.append('-c')
+ # add some startup arguments from this process
+ if fs_type:
+ args.append('--fs-type=' + fs_type)
+ if test_area_url:
+ args.append('--url=' + test_area_url)
+ if verbose_mode:
+ args.append('-v')
+ if cleanup_mode:
+ args.append('--cleanup')
+ if enable_sasl:
+ args.append('--enable-sasl')
+
+ self.result, self.stdout_lines, self.stderr_lines =\
+ spawn_process(command, 1, None, *args)
+ sys.stdout.write('.')
+ self.tests.append(self)
+
class TestRunner:
   """Encapsulate a single test case (predicate), including logic for
   runing the test and test list output."""
@@ -759,7 +807,6 @@
       sandbox.cleanup_test_paths()
     return result

-
######################################################################
# Main testing functions

@@ -770,27 +817,65 @@
# it can be displayed by the 'list' command.)

# Func to run one test in the list.
-def run_one_test(n, test_list):
+def run_one_test(n, test_list, parallel = 0, finished_tests = None):
   "Run the Nth client test in TEST_LIST, return the result."
+#TODO: add comment

   if (n < 1) or (n > len(test_list) - 1):
     print "There is no test", `n` + ".\n"
     return 1

   # Run the test.
- exit_code = TestRunner(test_list[n], n).run()
- return exit_code
+ if parallel:
+ st = SpawnTest(n, finished_tests)
+ st.start()
+ return 0
+ else:
+ exit_code = TestRunner(test_list[n], n).run()
+ return exit_code

-def _internal_run_tests(test_list, testnums):
+def _internal_run_tests(test_list, testnums, parallel):
   """Run the tests from TEST_LIST whose indices are listed in TESTNUMS."""
+#TODO: add comment about parallel

   exit_code = 0
+ finished_tests = []
+ tests_started = 0

- for testnum in testnums:
- # 1 is the only return code that indicates actual test failure.
- if run_one_test(testnum, test_list) == 1:
- exit_code = 1
+ if not parallel:
+ for testnum in testnums:
+ if run_one_test(testnum, test_list) == 1:
+ exit_code = 1
+ else:
+ for testnum in testnums:
+ # wait till there's a free spot.
+ while tests_started - len(finished_tests) > parallel:
+ time.sleep(0.2)
+ run_one_test(testnum, test_list, parallel, finished_tests)
+ tests_started += 1

+ # wait for all tests to finish
+ while len(finished_tests) < len(testnums):
+ time.sleep(0.2)
+
+ # Sort test results list by test nr.
+ deco = [(test.index, test) for test in finished_tests]
+ deco.sort()
+ finished_tests = [test for (ti, test) in deco]
+
+ print
+
+ # all tests are finished, find out the result and print the logs.
+ for test in finished_tests:
+ if test.stdout_lines:
+ for line in test.stdout_lines:
+ sys.stdout.write(line)
+ if test.stderr_lines:
+ for line in test.stderr_lines:
+ sys.stdout.write(line)
+ if test.result == 1:
+ exit_code = 1
+
   _cleanup_deferred_test_paths()
   return exit_code

@@ -807,7 +892,7 @@
#
# [<testnum>]... : the numbers of the tests that should be run. If no
# testnums are specified, then all tests in TEST_LIST are run.
-def run_tests(test_list):
+def run_tests(test_list, serial_only = False):
   """Main routine to run all tests in TEST_LIST.

   NOTE: this function does not return. It does a sys.exit() with the
@@ -820,13 +905,17 @@
   global verbose_mode
   global cleanup_mode
   global enable_sasl
+ global is_child_process
+
   testnums = []
   # Should the tests be listed (as opposed to executed)?
   list_tests = 0

- opts, args = my_getopt(sys.argv[1:], 'v',
+ parallel = 0
+
+ opts, args = my_getopt(sys.argv[1:], 'vp:c',
                          ['url=', 'fs-type=', 'verbose', 'cleanup', 'list',
- 'enable-sasl'])
+ 'enable-sasl', 'parallel='])

   for arg in args:
     if arg == "list":
@@ -856,21 +945,44 @@
     elif opt == "--enable-sasl":
       enable_sasl = 1

+ elif opt == '-p' or opt == "--parallel":
+ if val:
+ parallel = int(val)
+ else:
+ parallel = 10
+
+ elif opt == '-c':
+ is_child_process = 1
+
   if test_area_url[-1:] == '/': # Normalize url to have no trailing slash
     test_area_url = test_area_url[:-1]

+ ######################################################################
+ # Initialization
+
+ # Cleanup: if a previous run crashed or interrupted the python
+ # interpreter, then `temp_dir' was never removed. This can cause wonkiness.
+ if not is_child_process:
+ safe_rmtree(temp_dir)
+
   # Calculate pristine_url from test_area_url.
   pristine_url = test_area_url + '/' + pristine_dir
   if windows == 1:
     pristine_url = string.replace(pristine_url, '\\', '/')

   # Setup the pristine repository (and working copy)
+ #if not is_child_process:
   actions.setup_pristine_repository()

   if not testnums:
     # If no test numbers were listed explicitly, include all of them:
     testnums = range(1, len(test_list))

+ # don't run tests in parallel when the tests don't support it or there
+ # are only a few tests to run.
+ if serial_only or len(testnums) < 2:
+ parallel = 0
+
   if list_tests:
     print "Test # Mode Test Description"
     print "------ ----- ----------------"
@@ -881,11 +993,12 @@
     sys.exit(0)

   else:
- exit_code = _internal_run_tests(test_list, testnums)
+ exit_code = _internal_run_tests(test_list, testnums, parallel)

     # remove all scratchwork: the 'pristine' repository, greek tree, etc.
     # This ensures that an 'import' will happen the next time we run.
- safe_rmtree(temp_dir)
+ if not is_child_process:
+ safe_rmtree(temp_dir)

     _cleanup_deferred_test_paths()

@@ -893,14 +1006,6 @@
     sys.exit(exit_code)

-######################################################################
-# Initialization
-
-# Cleanup: if a previous run crashed or interrupted the python
-# interpreter, then `temp_dir' was never removed. This can cause wonkiness.
-
-safe_rmtree(temp_dir)
-
# the modules import each other, so we do this import very late, to ensure
# that the definitions in "main" have been completed.
import actions
Index: win-tests.py
===================================================================
--- win-tests.py (revision 22271)
+++ win-tests.py (working copy)
@@ -32,7 +32,7 @@
   print " -v, --verbose : talk more"
   print " -f, --fs-type=type : filesystem type to use (fsfs is default)"
   print " -c, --cleanup : cleanup after running a test"
-
+ print " -p #, --parallel=# : run # tests in parallel"
   print " --svnserve-args=list : comma-separated list of arguments for"
   print " svnserve"
   print " default is '-d,-r,<test-path-root>'"
@@ -57,15 +57,15 @@
client_tests = filter(lambda x: x.startswith(CMDLINE_TEST_SCRIPT_PATH),
                       all_tests)

-opts, args = my_getopt(sys.argv[1:], 'hrdvcu:f:',
+opts, args = my_getopt(sys.argv[1:], 'hrdvcp:u:f:',
                        ['release', 'debug', 'verbose', 'cleanup', 'url=',
                         'svnserve-args=', 'fs-type=', 'asp.net-hack',
- 'httpd-dir=', 'httpd-port=', 'help'])
+ 'httpd-dir=', 'httpd-port=', 'help', 'parallel='])
if len(args) > 1:
   print 'Warning: non-option arguments after the first one will be ignored'

# Interpret the options and set parameters
-base_url, fs_type, verbose, cleanup = None, None, None, None
+base_url, fs_type, verbose, cleanup, parallel = None, None, None, None, None
repo_loc = 'local repository.'
objdir = 'Debug'
log = 'tests.log'
@@ -85,6 +85,8 @@
     verbose = 1
   elif opt in ('-c', '--cleanup'):
     cleanup = 1
+ elif opt in ('-p', '--parallel'):
+ parallel = val
   elif opt in ['-r', '--release']:
     objdir = 'Release'
   elif opt in ['-d', '--debug']:
@@ -430,7 +432,8 @@
import run_tests
th = run_tests.TestHarness(abs_srcdir, abs_builddir,
                            os.path.join(abs_builddir, log),
- base_url, fs_type, 1, cleanup)
+ base_url, fs_type, 1, cleanup,
+ None, parallel)
old_cwd = os.getcwd()
try:
   os.chdir(abs_builddir)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Nov 12 21:45:19 2006

This message: [ Message body ]
Next message: Erik Huelsmann: "Re: RFC: Changing conditions for 'M'odified status"
Previous message: Christopher Boumenot: "Submitting svnignore for inclusion in contrib/client-side"
In reply to: Ivan Zhakov: "Re: Parallelizing the python testsuite"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]