On 8/17/07, cmpilato@tigris.org <cmpilato@tigris.org> wrote:
> -def shard(path, max_files_per_shard):
> - """Move all the files in PATH into subdirectories of PATH named such that
> - subdirectory '0' contains at most MAX_FILES_PER_SHARD files, those named
> - [0, MAX_FILES_PER_SHARD). Abort if PATH is found to contain any entries
> - with non-numeric names."""
> +def shard(path, max_files_per_shard, start, end):
> + """Move the files for revisions START to END inclusive in PATH into
> + subdirectories of PATH named such that subdirectory '0' contains at most
> + MAX_FILES_PER_SHARD files, those named [0, MAX_FILES_PER_SHARD). Abort if
> + PATH is found to contain any entries with non-numeric names."""
>
> - # Move all entries into shards named N.shard.
> - for name in os.listdir(path):
> - try:
> - rev = int(name)
> - except ValueError, OverflowError:
> - print >> sys.stderr, "error: file '%s' does not represent a revision." \
> - % os.path.join(path, name)
> - sys.exit(1)
I think this breaks the guarantee in the doc string that it will abort
if a non-numeric entry is found. (I think that's reasonable? But the
doc string should be changed at least.)
> + tmp = path + '.reshard'
> + try:
> + os.mkdir(tmp)
> + except OSError, e:
> + if e.errno != EEXIST:
> + raise
>
> + # Move all entries into shards named N.shard.
> + for rev in xrange(start, end + 1):
> + name = str(rev)
> shard = rev // max_files_per_shard
> shard_name = str(shard) + '.shard'
>
> from_path = os.path.join(path, name)
> - to_path = os.path.join(path, shard_name, name)
> + to_path = os.path.join(tmp, shard_name, name)
> try:
> os.rename(from_path, to_path)
> except OSError:
> # The most likely explanation is that the shard directory doesn't
> # exist. Let's create it and retry the rename.
> - os.mkdir(os.path.join(path, shard_name))
> + os.mkdir(os.path.join(tmp, shard_name))
> os.rename(from_path, to_path)
>
> # Now rename all the shards to remove the suffix.
> - for name in os.listdir(path):
> + skipped = 0
> + for name in os.listdir(tmp):
> if not name.endswith('.shard'):
> print >> sys.stderr, "warning: ignoring unexpected subdirectory '%s'." \
> - % os.path.join(path, name)
> + % os.path.join(tmp, name)
> + skipped += 1
> continue
> - from_path = os.path.join(path, name)
> - to_path = from_path[:-6]
> + from_path = os.path.join(tmp, name)
> + to_path = os.path.join(path, os.path.basename(from_path)[:-6])
> os.rename(from_path, to_path)
> + skipped == 0 and os.rmdir(tmp)
Is this a common Pythonism? "if skipped == 0: os.rmdir(tmp)" certainly
makes me think less. Also, the log message implies that parallel
reshardings are reasonable, but it doesn't seem to me that this will
work (it definitely will be strange if the parallel reshardings would
affect the same shard, but even if they are building distinct shards,
this loop doesn't make sure that the shard it is installing is in the
range that it processed, so it might install an incomplete shard from
another process).
--dave
--
David Glasser | glasser_at_davidglasser.net | http://www.davidglasser.net/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Aug 17 23:28:18 2007