[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: philosophical questions about mailer.py

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: Thu, 08 Jan 2009 10:48:55 -0500

Ben Collins-Sussman wrote:
>> Hm. I think you don't understand svn_repos_replay(). It isn't a
>> brute-force comparison. It fetches the list of changed paths and then,
>> using path-math, drives an editor to describe the changes. (Read the big
>> comment atop libsvn_repos/replay.c)
> Hm, okay, then we need to investigate why get_logs() is so quick
> compared to replay() on big imports -- something is fishy, for sure.
>> So, yes, you can make mailer.py just fetch the changed paths. And then you
>> can make it do all the work of tracking locations changes that occur with
>> copies and moves (so you can accurately generate diffs). But now you've
>> practically implemented svn_repos_replay + the ChangeCollector. :-)
> Doh.

Well, _replay() is necessarily slower than just fetching the list of changed
paths. (That's a guarantee, because a fetch of the list of changed paths is
the first thing that _replay() does!) And the measured cost of _replay()
will include the cost of any editor it drives, right?

Here's where I think the benefit to mailer.py will come (and I say this as a
way of saying, "You may be onto something here, so keep investigating"):
mailer.py doesn't need to present it's results in a tree structure. So all
the work that _replay() does to take a flat list of paths and regenerate the
tree structure is almost immediately discarded by the consumer. That's a
lot of wasted processing.

C. Michael Pilato <cmpilato_at_collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Received on 2009-01-08 16:49:16 CET

This is an archived mail posted to the Subversion Dev mailing list.