[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: random access delta streams

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2000-11-14 06:30:19 CET

> I'm thinking about storing really, really large objects in the
> Subversion repository. How hard would it be to implement seeking on
> the result of applying a seekable text delta to some seekable
> stream?

This should be possible. What you'd do is read through the delta and
build a table of contents for the target file. Then for each
seekable_read() call, you'd look up the appropriate delta windows in
the table of contents, read those windows from the delta stream, and
apply them to the appropriate source views from the source stream.
You might want to cache the last target view read, or you might not,
depending on the space/time tradeoffs and how much locality of
reference you enjoy.

Whether you'll get reasonable performance when you have a lot of
deltas in a chain, I'm not sure. For space: a 2GB file would have
20,000 windows, so the table of contents alone for each delta would be
320K (two off_t values per window, 8 bytes per off_t). So if you have
ten deltas being applied at once, that's 3MB, probably more like 5MB
when you start looking at all the buffers which will be in use at once
as the deltas are applied down the chain.

For time: performance for a single seek and read should be linear in
the number of diffs applied, but with fairly hefty coefficients.

I could write the code; it sounds like fun. From an architectural
perspective I'm not sure whether or where I think these functions
should go into the subversion libraries, though. What would use it?
Received on Sat Oct 21 14:36:14 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.