Ben Collins-Sussman wrote:
>> Hm. I think you don't understand svn_repos_replay(). It isn't a
>> brute-force comparison. It fetches the list of changed paths and then,
>> using path-math, drives an editor to describe the changes. (Read the big
>> comment atop libsvn_repos/replay.c)
> Hm, okay, then we need to investigate why get_logs() is so quick
> compared to replay() on big imports -- something is fishy, for sure.
>> So, yes, you can make mailer.py just fetch the changed paths. And then you
>> can make it do all the work of tracking locations changes that occur with
>> copies and moves (so you can accurately generate diffs). But now you've
>> practically implemented svn_repos_replay + the ChangeCollector. :-)
Well, _replay() is necessarily slower than just fetching the list of changed
paths. (That's a guarantee, because a fetch of the list of changed paths is
the first thing that _replay() does!) And the measured cost of _replay()
will include the cost of any editor it drives, right?
Here's where I think the benefit to mailer.py will come (and I say this as a
way of saying, "You may be onto something here, so keep investigating"):
mailer.py doesn't need to present it's results in a tree structure. So all
the work that _replay() does to take a flat list of paths and regenerate the
tree structure is almost immediately discarded by the consumer. That's a
lot of wasted processing.
C. Michael Pilato <cmpilato_at_collab.net>
CollabNet <> www.collab.net <> Distributed Development On Demand
Received on 2009-01-08 16:49:16 CET