[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] : full-text instead of vdelta against empty bytestream

From: David Kimdon <david_at_kimdon.org>
Date: 2003-11-04 22:13:41 CET

On Tue, Oct 21, 2003 at 09:47:19PM -0400, mark benedetto king wrote:
> I wish I could take credit for this idea, but Greg Hudson pointed out
> on IRC that we could store the youngest revisions of files as deltas
> against the empty bytestream.
>
> Then there would be no CPU cost to the compression; it would be already
> done, and the one-time-per-commit cost amortized over the many checkouts.
>
> I think this would also cut down repository size, since the fulltexts would
> not exist anywhere in the repository. The smaller size could conceivably
> improve performance, since there would be fewer pages to read for certain
> operations, higher chance that any particular page would be in cache, etc.

I have now two different patches[1] to store the youngest revisions as
deltas against the empty bytestream.

Approach 1: Do the youngest rev deltification inside
deltify_mutable(), that is when the rest of the deltification occurs.
This means the representation is first stored as fulltext, then later
compressed (deltified).

Approach 2: Never creates a fulltext, rather the youngest revision is
a delta against an empty byte stream from when that representation is
created (the beginning of that node revision's life) so nothing needs
to be done when the rest of the deltification occurs. This one is
conceptually cleaner but my implementation currently has some rough
edges.

Approach 2 saves on disk space over the current full text storage.
Approach 1 does not, I think it has something to do with the way the
database works, the fulltext hangs out in the database file (though
not actually as part of the database, i.e. db_dump doesn't show the
fulltext strings, but a hexeditor does) for a revision or so after it
isn't needed (if anyone can explain that, I'm eager to know the
details).

Neither approach has a very negative effect on performance, but
neither approach has a very positive effect, a mixed bag in both
cases.

The next idea is to use the fact that we have deltified youngest
revisions to speed up checkout and export. I have a broken (but
complete, maybe . . .) patch that does this, still working through the
details.

David

[1] : Neither of them are in a presentable form at present.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Nov 4 21:22:02 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.