Re: svn commit: rev 5750 - branches/cvs2svn-kfogel/tools/cvs2svn

From: Greg Stein <gstein_at_lyra.org>
Date: 2003-04-29 00:44:07 CEST

On Mon, Apr 28, 2003 at 03:27:48PM -0500, kfogel@tigris.org wrote:
>...
> +++ branches/cvs2svn-kfogel/tools/cvs2svn/cvs2svn.py Mon Apr 28 15:27:44 2003
>...
> +def ensure_directories(path, root, dumpfile):
> + """Output to DUMPFILE any intermediate directories in PATH that are
> + not already present under node ROOT, adding them to ROOT's tree as
> + we go. Return the last parent directory, that is, the parent
> + of PATH's basename. Leading slash(es) on PATH are optional."""
> + path = path.lstrip('/')

This string method usage requires Python 2.0, whereas cvs2svn.py used to
only require Python 1.5.2. And with the future dropping of the bindings,
then it will be even easier to only do 1.5.2 (while our bindings can work
against 1.5.2, I'm not sure how many people try that).

So... that would change to:

path = string.lstrip(path, '/')

> + path_so_far = None
> + components = string.split(path, '/')

One of the things that I like to do is:

components = filter(None, string.split(path, '/'))

That filter() usage will filter out all empty strings, which means it
ignores leading and trailing slashes, and double-slashes. Kinda nifty :-)

> + last_idx = len(components) - 1
> + this_node = root
> +
> + i = 0
> + while (i < last_idx):
> +
> + component = components[i]

I don't think you need the "i" logic. Just do:

for component in components:

> + if path_so_far:
> + path_so_far += '/' + component

The += construct is also Python 2.0 based.

>...
> +def get_md5(path):
> + """Return the hex md5 digest of file PATH."""
> + f = open(path, 'r')
> + checksum = md5.new()
> + buf = f.read(102400)
> + while buf:
> + checksum.update(buf)
> + buf = f.read(102400)
> + f.close()
> + return checksum.hexdigest()

There isn't a real need to close the file, as it will happen when "f" goes
out of scope.

>...
> + # Make the dumper's temp directory for this run. RCS working
> + # files get checked out into here.
> + os.mkdir(self.tmpdir)

Couldn't you use 'co -p' and avoid temp files altogether?

>...
> + # Anything ending in ".1" is a new file.
> + #
> + # ### We could also use the parent_node to determine this.
> + # ### Maybe we should, too, because ".1" is not perfectly
> + # ### reliable, because of 'cvs commit -r'...
> + if re.match('.*\\.1$', cvs_rev):

Bah. Perl-style overkill :-)

if cvs_rev[-2:] == '.1':

>...
> + self.dumpfile.write('Node-path: %s\n' % svn_path)
> + self.dumpfile.write('Node-kind: file\n')
> + self.dumpfile.write('Node-action: %s\n' % action)
> + self.dumpfile.write('Prop-content-length: %d\n' % 10) ### svn:executable?
> + self.dumpfile.write('Text-content-length: %d\n' % os.path.getsize(working))
> + self.dumpfile.write('Text-content-md5: %s\n' % get_md5(working))
> + self.dumpfile.write('Content-length: %d\n' % 0) # todo
> + self.dumpfile.write('\n')
> + self.dumpfile.write('PROPS-END\n')

Might be interesting to have a little helper function/object that writes out
all this stuff, and can be shared between here and the Node class.

> + ### This is a pity. We already ran over all the file's bytes to
> + ### get the checksum, now we have to do it again to insert the
> + ### file's contents into the dumpstream? What a lose.
> + ###
> + ### A solution: write '00000000000000000000000000000000' for the
> + ### initial checksums, then go back and patch them up after the
> + ### entire dumpfile has been written. (We'd calculate the
> + ### checksum as we get each file's contents, record it somewhere,
> + ### and look it up during the patchup phase.)
> + ###
> + ### We could also use the `md5sum' utility to get the checksum,
> + ### and the OS's file append capability to append the file. But
> + ### then we'd have a dependency on `md5sum' (yuck); and how would
> + ### our open filehandle interact with data being inserted behind
> + ### its back? Not very well, I imagine.

Right. This plays into the use of 'co -p', too. Nominally, it would be great
to be able to do:

  f = os.popen('co -p ...')
  checksum = md5.new()
  buf = f.read(CHUNK_SIZE)
  while buf:
    checksum.update(buf)
    dumpfile.write(buf)
    buf = f.read(CHUNK_SIZE)
  digest = checksum.hexdigest()

The problem is going back. You can use dumpfile.tell() to get a seek
position, and then do the backwards seek stuff. Another alternative would be
taking a direction from HTTP and have the notion of "trailing" headers. The
checksum would come in an RFC822-ish "header block" *after* the content.

Oh, and your "patchup phase" wouldn't be that complicated. Just something
like:

  dumpfile.write('Checksum: ')
  pos = dumpfile.tell()
  dumpfile.write(CHECKSUM_PLACEHOLDER + '\n')
  ...
  copy content
  ...
  dumpfile.seek(0, pos)
  dumpfile.write(checksum.hexdigest())
  dumpfile.seek(2, 0)

You could also do:

  import FCNTL
  ...
  dumpfile.seek(FCNTL.SEEK_SET, ...
  ...
  dumpfile.seek(FCNTL.SEEK_END, ...

rather than 0 or 2.

>...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Tue Apr 29 00:43:32 2003

This message: [ Message body ]
Next message: rbb_at_rkbloom.net: "Re: Compressed text-base patch"
Previous message: Greg Stein: "Re: svn commit: rev 5677 - in trunk/subversion: include libsvn_wc"
Next in thread: Karl Fogel: "Re: svn commit: rev 5750 - branches/cvs2svn-kfogel/tools/cvs2svn"
Reply: Karl Fogel: "Re: svn commit: rev 5750 - branches/cvs2svn-kfogel/tools/cvs2svn"
Reply: Branko Čibej: "Re: svn commit: rev 5750 - branches/cvs2svn-kfogel/tools/cvs2svn"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]