[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Looking to improve performance of svn annotate

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Thu, 12 Aug 2010 15:57:53 +0100

On Thu, 2010-08-12, C. Michael Pilato wrote:
> In times past, I've wondered if the server couldn't just store line-delta
> information -- as a comparison between each FS DAG node and its immediate
> predecessor -- similar to the way that CVS does (and in addition to the
> stuff it already stores, of course). The line-delta info could be populated
> post-commit just as the BDB backend did deltafication, or perhaps also on
> demand (to rebuild this information for older servers) via some 'svnadmin'
> command.

The server could usefully calculate and store it - but on demand and
cached, not at commit time - see below.

> But it shouldn't ever change once calculated, right?

Well ... that depends. The line-ending style the client wants to use
can vary over time: in order to cope with the cases where it was not
specified correctly in earlier revisions, the client may want to use the
EOL style specified by HEAD (or the latest version being blamed). It
may be possible to calculate and store a generic data set that will be
useful whatever EOL style the client eventually decides it should be
using.

> My only concern is in dealing with the definition of a "line". The FS layer
> today is happily file-content agnostic. All EOL translation stuffs are
> considered voodoo of interest only to the client layers. We could, of
> course, choose to make the FS layer pay attention to the likes of the
> svn:eol-style property, but that feels mucho icky.
>
> Thoughts?

If the decision to calculate and store linewise info for a given file is
made at commit time by the server, it would probably want to consider
svn:mime-type or the likes, to avoid doing it unnecessarily on all
files. Clients might then never request blame info on a majority of
files. Conversely, a client might request blame info for a file on
which the server thought the data would not be needed (perhaps because
it was marked 'application/octet-stream' for example).

I would think that having the server calculate the info on demand and
store it in a limited-lifetime cache is more sensible, if we take a
server-assisted approach at all. A cache would handle the cases above
better, and could also be made to handle requests with varying
definitions of EOL style if that is necessary.

I'm wary of embedding any client functionality in the server, but I
guess it's worth considering if it would be that useful. If so, let's
take great care to ensure it's only lightly coupled to the core server
logic.

- Julian
Received on 2010-08-12 16:58:45 CEST

This is an archived mail posted to the Subversion Dev mailing list.