On Wed, 2010-08-11 at 19:14 -0400, Johan Corveleyn wrote:
> I naively thought that the server, upon being called get_file_revs2,
> would just supply the deltas which it has already stored in the
> repository. I.e. that the deltas are just the native format in which
> the stuff is kept in the back-end FS, and the server wasn't doing much
> else but iterate through the relevant files, and extract the relevant
> bits.
The server doesn't have deltas between each revision and the next (or
previous). Instead, it has "skip-deltas" which may point backward a
large number of revisions. This allows any revision of a file to be
reconstructed in O(log(n)) delta applications, where n is the number of
file revisions, but it makes "what the server has lying around" even
less useful for blame output.
It's probably best to think of the FS as a black box which can produce
any version of a file in reasonable time. If you look at
svn_repos_get_file_revs2 in libsvn_repos/rev_hunt.c, you'll see the code
which produces deltas to send to the client, using
svn_fs_get_file_delta_stream.
The required code changes for this kind of optimization would be fairly
deep, I think. You'd have to invent a new type of "diffy" delta
algorithm (either line-based or binary, but either way producing an edit
stream rather than acting like a compression algorithm), and then
parameterize a bunch of functions which produce deltas, and then have
the server-side code produce diffy deltas, and then have the client code
recognize when it's getting diffy deltas and behave more efficiently.
If the new diffy-delta algorithm isn't format-compatible with the
current encoding, you'd also need some protocol negotiation.
Received on 2010-08-12 01:34:00 CEST