I'm in the process of taking a single svn repository that contains code
for two seperate products and splitting it into two seperate repository.
Unfortunately, this turns out to be a lot more complicated than it
should be. Thus, some suggestions for improvements to svndumpfilter that
would be very helpful. Unfortunately I will probably not find time to
implement them, but I'm posting them anyway as input for whoever is
* better pattern matching. (Globbing or regular expressions)
Just prefixes is not sufficient in many cases. E.g. I wanted to exclude
the "foo" module but not the "foobar" module. I don't think this is
possible with the current svndumpfilter (I ended up excluding "foo/" but
then I end up with an empty "foo" directory). Also I wanted to exclude
certain modules in all tags/branches. Now I had to preprocess and create
an explisit list of prefixes. "exclude tags/*/foo" would have been soo
* reading rules from file
The current svndumpfilter expects all exclude rules as command line
arguments. There are upper limits for command line length, so this does
not scale well. Support for reading the rules from a file would be nice.
Of course if it had globbing or regexp support we would have needed far
fewer rules so it probably wouldn't have been a problem.
* improve performance
svndumpfilter is taking a very long time to match transactions towards
the exclude-list. Performance could be improved dramatically by keeping
the prefixes in an ordered list or a b+tree or something and not compare
every path to every single prefix. Of course, if it used proper pattern
matching rather than prefixes this would be less critical.
* revision numbers in rules.
It's currently impossible to exclude some commits to a path but not
others. E.g. I have a tag that was created, deleted and recreated. I
want to exclude the first directory, but not the second. If I could
exclude "/tags/foo-XXX_at_1-500" that would have solved the problem.
* handle invalid copies better.
Currently svndump fails if the sourcepath of a copy operation is
excluded but not the destination. Maybe it would be better if the user
could choose between different ways to handle this problem:
1. exclude the copy operation as well (a bit dangerous, but...)
2. not exclude the source path anyway
3. convert the copy to an add operation. So the file will be included
correctly, but history will be excluded.
4. move the history of the file from the source path to the destination
of the copy. So, e.g. if a file is copied from "a" to "b" to "c" and
"b" is excluded, it would be replaced with a copy operation from "a".
I see that there are some svndumpfilter reimplementations that implement
nr 3, but I haven't seen any that implement nr 4.
The svndumpfilter tool could be improved in several ways to cover more
usecases, but even so there are likely a lot of cases that will require
custom code. Therefor it would be very good if svndumpfilter provided an
library that could be extended and customized it rather than
start from scratch each time. Primarely the library should have the
* dumpfile parsing
* dumpfile generation
* invalid copy handling functions
In terms of a python API the parsing function should be a generator that
generates Transaction instances. The dumpfile generator should consume
an iterator of Transaction instances. Filters and copy hanlding
functions should take one Transaction object as input and should return
a (optionally modified) Transaction object. This would make it really
easy to extend svndumpfilter with new functionality.
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-01-09 02:24:51 CET