[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

dev@subversion.tigris.org

From: <screwdriver_at_lxnt.info>
Date: 2003-03-11 20:45:49 CET

Hello.

A work-in-progress-grade svndumpfilter sources are available at

http://lxnt.info/sdf/

(build.conf.patch is against r5119)

It is a massively mutated svnadmin. Thus, among other temporary
shortcuts, --exclude and --include options became exclude and
include subcommands, taking path-prefixes as rest of arguments.

It does work, at least it managed to successfully (i.e. results loaded ok,
checked out ok) filter a ~48Mb dumpfile of a relatively simple repository
(this was on a box with 4GB RAM, see question about memory management).

While working on it a bunch of questions came up. They are:

Questions:

Should I turn subcommands back into options (keeping in mind that
they are mutually exclusive)?

Paths in dump are stored in what? UTF8?
( the question is, in what encoding do the header values
end up supplied to the parser vtable members ? )

Are header names case-insensitive?

Should parser recalculate & verify MD5 sums?
(it now just passes them through intact)

What to do, if any, with copyfrom data?
(it is now just passed through intact)

What to do with revisions that only contain rev-props? Like rev 0?
What to do with revisions that contain rev-props, but have all
nodes filtered out?

Current behaviour is:

Revision is written out in the following cases:
 1. No --drop-empty-revs has been supplied.
 2. Revision has nodes remaining after filtering.
 3. Revision had no nodes before filtering.

Has retaining original revision numbers when some revisions get dropped
any sense? Or they should be unconditionally renumbered?

Current behavior is to renumber if a revision is skipped.
AFAIK revnumbers are not taken into account when loading a dump, so this
seems harmless (and I implemented renumbering before I had a chance to think
about it :) ). This also has a side effect so that implementing
shift in rev-numbers in the resulting dumpstream is easy (unsure of it having
any sense).

PROPS section is optional when there are no props. Currently I have no
way of detecting whether it was present, and the code does not
output empty PROPS sections.

Resulting order of node and revision headers (sans content-length headers)
is different from original due to them going via apr_hash_t.

BTW: currently svnadmin treats a
text-content-length: 0
header as an attempt to set fulltext on a node. Should this be so?

What's the Right Thing to do with memory allocation?

At the moment, everything goes into a single pool, which, consequently,
grows up to dump file size plus some overhead. Also, node_baton and
revision_baton get created for each new node and revision. I know this
is a mess. I think I should create only one instance of both, plus
a subpool for each, which will be clear()-ed every time a baton is reinitialized.
Is this ok from pool-usage-guidelines point of view? (keeping in mind that
pointers to those subpools will be kept in the batons themselves)

(the mess is there because I needed more-or-less working filter ASAP, sorry.)

Should I get rid of using stringbuf streams to printf headers?
(probably I should, but ...)

What's exactly the difference between dumpfile format versions 1 and 2?

PS.
1. The build framework is truly awesome.
2. Never ever have I seen so easy in use libraries.
Big thanks to everyone for doing such a great job!

-- 
/lxnt
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Mar 11 20:45:49 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.