[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] : full-text instead of vdelta against empty bytestream

From: David Kimdon <david_at_kimdon.org>
Date: 2003-11-05 00:22:20 CET

On Tue, Nov 04, 2003 at 03:27:06PM -0600, kfogel@collab.net wrote:
> David Kimdon <david@kimdon.org> writes:
> > The next idea is to use the fact that we have deltified youngest
> > revisions to speed up checkout and export. I have a broken (but
> > complete, maybe . . .) patch that does this, still working through the
> > details.
>
> I thought we *don't* have deltified youngest revisions,

true.

> and the reason
> we don't deltify them is precisely to speed up read operations.

That's the theory as I understand it as well.

> Am I missing something?

Another theory is to keep the revisions in a form that is easily
digestable by certain operations that we would like to speed up[1].
The expectation is that other operations will not be inordinately
slowed thanks to the delta combiner. On checkout and export of head
(a fairly common operation?) or when a new file is added to update we
always create delta windows against an empty byte-stream. That delta
takes a lot of CPU cycles to calculate. If instead of needing to
create that textdelta we needed to only convert it from svndiff to
textdelta then CPU time spent on the server will be much less.

That's the theory anyway,

-David

[1] : I've also been toying with the idea of multiple representations
of the same node-revision. This idea might be too much along the
lines of representation undeltification, i.e. not as much of a
performance enhancement as hoped for. Anyway, the fs can choose
which sub-representation it wants to use based on the operation
requested, consider for a given node:

- update from r3 to r10 - fs doesn't have that sub-representation, it
  calculates it and stores a new sub-representation, no speedup here

- update from r6 to r10 - fs calculated that yesterday, and saved the
  svndiff sub-representation, a bit of conversion (into textdelta
  format) and it is off to the consumer more quickly than if it needed
  to do the calculation from scratch
  
- checkout, export of any version - fs always keeps a
  sub-representation of the file against an empty-bytestream so this
  can be speedy too, but every version fulltext, ouch, that's a lot of
  space.
  
  etc.

In this case we are using more disk space and saving CPU time. There
would need to be a process of pruning sub-representations that haven't
been used in a while to avoid repository bloat, or course.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Nov 4 23:30:42 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.