[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: rev 5750 - branches/cvs2svn-kfogel/tools/cvs2svn

From: Greg Stein <gstein_at_lyra.org>
Date: 2003-04-29 01:56:28 CEST

On Mon, Apr 28, 2003 at 05:51:44PM -0500, Karl Fogel wrote:
>...
> > So... that would change to:
> >
> > path = string.lstrip(path, '/')
>
> Ah, hmmm. I was wondering about that. The above usage is what I
> tried first, but it gave an error in Python 2.2.2... What to do?

Whoops. It errors in 1.5.2, also. lstrip() just removes leading whitespace.
You can't give it other characters to remove.

Ah well. You like the filter() idiom, which does what you want anyways.

>...
> > > + last_idx = len(components) - 1
> > > + this_node = root
> > > +
> > > + i = 0
> > > + while (i < last_idx):
> > > +
> > > + component = components[i]
> >
> > I don't think you need the "i" logic. Just do:
> >
> > for component in components:
>
> That's what I started with, actually -- but I need to know when I hit
> the component right *before* the last component. I wish loops had a
> 'next_to_last' keyword or something :-).

ah!

  for component in components[:-1]:

That'll create a new slice of the list, excluding the last component.

> > > + if path_so_far:
> > > + path_so_far += '/' + component
> >
> > The += construct is also Python 2.0 based.
>
> Okay. Mike Pilato also pointed this out, but I didn't realize at the
> time that we were within spitting distance of 1.5.2 compatibility.
> I'll get rid of the "+=" stuff.

Right. "spitting distance" is the key. Since we are so close, we may as well
stay there. There are a lot of Python installations still on 1.5.2 (for
example, all RedHat distros prior to 8.0).

>...
> The other problem is that Node tree. Because Subversion uses paths
> for branches, the full size of the node tree is much greater than just
> the tree one would get from

Woah! Forgot about that. Yah... can't keep that in memory.

>...
> (If you have any suggestions for a particularly Python-ish way to do
> it, they are welcome of course. I was just going to use `anydbm' and
> come up with some simple representation, perhaps involving pickling.)

anydbm should be fine. 'marshal' is just fine and very fast, if you stick
with native Python datatypes (list, dict, string, integer, etc). You only
need Pickle if you need to serialize class instances. My understanding is
that cPickle is nearly as fast as marshal, but I doubt you'll need much
beyond a "dictionary" that looks like:

  { "path1": None, "path2": None }

And you just test with .has_key(path). (I used None in the example cuz you
don't need a value(?); just the key)

Be wary of anydbm. If that defaults to dumbdbm, then you're going to end up
with the index in memory. Kaboom! Suggestion:

  import anydbm
  
  if anydbm._defaultmod.__name__ == 'dumbdbm':
    print 'ERROR: your installation of Python does not contain a proper'
    print ' DBM module. This script cannot continue.'
    print ' to solve: see blah blah blah'
    sys.exit(1)

On my RH 7.2 installation, Python has a BDB module. I'd expect that you'll
be fine with the DBM restriction.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Apr 29 01:55:53 2003

This is an archived mail posted to the Subversion Dev mailing list.