[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Speeding up blame

From: Peter N. Lundblad <peter_at_famlundblad.se>
Date: 2004-05-28 21:16:51 CEST

On Thu, 27 May 2004, Mark Benedetto King wrote:

> On Thu, May 27, 2004 at 07:59:26PM +0200, Peter N. Lundblad wrote:
> > >
> > > I don't think that the RA implementation can reconstruct the fulltext
> > > from the deltas (at least not without some WC callbacks; if that's
> > > what you're proposing, then I think we're on roughly the same page).
> > >
> > You're right, ofcourse. It would have to keep track of the last fulltext.
>
> Well, not completely right. The RA layer could get the fulltext of the
> start revision, and then deltas for the revisions thereafter.
>
What I was refering to was that the RA layer on the client needs to keep
track of the last fulltext in a temporary file for the previous revision
each time.

> It would be neat if the fact that the WC adm area contains the
> pristine BASE rev could be brought into play (perhaps the RA layer
> would retrieve the delta from BASE->start rather than the fulltext
> of start), but the wins in terms of network i/o are probably not
> worth the additional implementation complexity.
>
I think this would be an optimization that is worth it when blaming a
small revision range near BASE. But then you have little data transfered
anyway... Don't think it is worth the complexity.

> One last blue sky idea:
>
> Right now, svn_client_blame() uses svn_diff_file_diff(),
> but there is also a generalized interface that could be applied
> to streams. Using that interface, it should be possible to
> avoid constructing any of the fulltexts at all, since the
> stream of revision N+1 can be computed from the stream of
> revision N and the delta from N to N+1.
>
How do you apply this to a forward-only svn_stream_t? It needs to be able
to compare arbitrary tokens, it seems.

Below is a little write-up of my current design plans for this little
improvement. Don't know if it is worth committing into notes temporarily.
Anyway, it is good to write ones thoughts down...

Blame Optimization Plan
=======================

This is a temporary document explaining the plans for optimizing the SVN
blame functionality. It will be removed when this is implemented.

The Problem
-----------

Today, blame:
a) Transfers each relevant revision of a file in fulltext,
b) Needs a network round-trip for each revision of a file that is examined.

a) is a mostly a problem over slow links and b) is a problem over a WAN.

The Solution
------------

The proposed solution is to implement a new RA layer function:

svn_error_t *
get_file_revs(void *session_baton, const char *path, svn_revnum_t start,
  svn_revnum_t end, svn_ra_file_rev_handler_t handler, void *handler_baton,
  apr_pool_t *pool);
in the RA vtable.

This will call handler for a range of revisions of the file specified
by path and end. It will start at the youngest revision at or before start,
where the file was changed if such a revision exists. Else, it will
use the first revision of the file in the repository. It will then
call handler for each revision where the file was changed until end is
reached. handler will never be called for a revision larger then end.

NOTE: It is necessary to deliver a revision before or at start, so
that the blame implementation can differientiate betwwen changes made
in the first revision at or after start and the contents before
start. The client can't just fetch the contents of the file in start
- 1 since that may be an unrelated object with the same path. So we
choose between this and having to use a log call to determinte the
history before start.

svn_ra_file_rev_handler_t is a callback defined as follows:
typedef svn_error_t *
  (*svn_ra_file_rev_handler_t) (void *baton,
    const char *path, svn_revnum_t revision, const apr_hash_t *rev_props,
    apr_file_t *contents, const apr_hash_t *props,
    apr_pool_t *pool);

The file contents will be available in contents. During a call to the
handler, contents from the last call will still be valid. Also,
contents from the last call will live in the pool given to get_revs,
so it will be available after the call. [Are these semantics too
weird for a general-purpose RA layer function? Are they too blame-oriented?]

NOTE: The caller of get_file_revs will get fulltext rather than
deltas. Deltas will, however, be used over network RA access
methods. I don't see any need for giving the client text deltas
instead of fulltexts. The blame command will recreate the files and
diff them anyway. If a client needs deltas, it can easily create them
itself.

NOTE: We give the callback apr_file_ts instead of svn_streams so it
can read the files more than once. The RA layer will need to create
temporary files anyway to be able to apply text deltas for the
following revisions.

Implementation
--------------

On the server side, get_file_revs will be implemented in terms of
svn_fs_node_history and a series of calls to
svn_fs_get_file_delta_stream . The deltas will be sent over the wire
and turned into fulltexts on the client. I've been considering a
svn_repos function for this, but since it will be duplicated in
ra_local anyway (see below), it might not be forth the little saving
in code. [Is this a good design decision?]

In ra_local, we don't need deltification at all. We just use
svn_fs_file_contents for each interesting revision.

On the client, and in ra_local, we will use open_tmp_file to ask the
client to create temporary files for us.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri May 28 21:08:38 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.