On 8 Feb 2002, Greg Hudson wrote:
> On Fri, 2002-02-08 at 17:03, Greg Stein wrote:
> > The theory is that you can combine a series of (small) deltas into a single
> > delta. Then you grab the fulltext and apply the one delta. Right now, we
> > produce intermediate fulltexts as we apply each delta in turn. If those
> > fulltexts spill outside of a window size, then everything goes to hell
> > (which is why we disable deltas for sources larger than the window).
>
> Do we understand why everything goes to hell when plaintexts spill over
> the window size? Windows are there to restrict memory usage, not make
> things worse.
>
> I think I understand the general theory. It goes something like:
>
> * If we apply all the deltas streamily, then we use about 384K of
> memory per delta (128K destination buffer plus 256K source buffer,
> asuming the plaintexts involved are at least 256K), which gets big too
> fast if the number of deltas grows large.
>
> * If we apply the deltas one after another, then we have to make a
> pass over each intermediate plaintext even if very little has changed.
> Plus we need intermediate storage equal to the size of the largest
> intermediate plaintext.
>
> * But if we combine the deltas, then we only need intermediate storage
> equal to the size of the largest delta.
>
> Another option is to apply the deltas streamily and try to keep the
> number of deltas small, by using some technique like skiplist deltas.
> If we did that, then even if there are 1024 revs of a file, there should
> be no more than about 20 deltas between any two revs, for at most 7.5MB
> of space required. (Which is still kind of big... we could cut it down
> to 2*windowsize by using a specialized chain-delta applicator which
> shares the destination view buffer of one delta and the source view
> buffer of the next. That might be over-optimizing, though.)
>
But has anyone else actually profiled the code to see *where* the time is
spent trying to get old revisions?
My profiles actually show decode_int and decode_instruction near the top,
total about 40% of the time, and are called 36 million times each.
This is with a 512k svn_stream_chunk_size, so it's going to have to decode
a lot of diffs to get a given revision.
But the decoding shouldn't be 40%+ of the time taken, applying them should
be.
The other top time waster is window_handler, which is what it *should*
be.
This is with --disable-shared, of course.
--Dan
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:05 2006