[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: RFC: Delta indexing and composition

From: <kfogel_at_collab.net>
Date: 2001-07-31 02:59:54 CEST

Branko =?ISO-8859-2?Q?=C8ibej?= <brane@xbc.nu> writes:
> There's the SIZE field in the DIFF production ...
>
> >or even
> >
> > DELTA ::= (("delta" FLAG ...) (RANGE-START RANGE-END WINDOW) ...) ;
> >
> >That way, you'd be able to find out the total size of a reconstructed
> >text without actually having to reconstruct it. Well, perhaps if
> >that's the problem, it would be simpler just to stick with your
> >original scheme and add one more piece of data:
> >
> > DELTA ::= (("delta" FLAG ...) RECONSTRUCTED-SIZE (OFFSET WINDOW) ...) ;
> >
> ... but yes, the whole idea of this is to have an ordered sequence of
> windows that you can do binary searches on, keyed by offsets into the
> reconstructed plaintext.

Oh, I think I misread `structure', or mis-applied your proposed
changes.

As long as we can discover the correct window without actually
retrieving any windows, that's all. That is, just by walking through
the rep skel's sublists, we can know what diff windows we're
interested in and which we're not -- that's the important thing. If
we have to start retrieving strings to find it out, then we lose.

> Oh, no. This is strictly post-M3 stuff. It doesn't add any
> functionality, it's only a performance improvement.

True. I think we may want a performance improvement before M3, though
will run some tests first to make sure. :-) There are some cheap ones
available, if we do...

> > It's okay if you don't;
> >just check in your design to the notes/ directory somewhere, and Mike
> >and/or I will get to it.
> >
> Will do that.

Thanks!

> The Sleepycat home page only mentiones database and object size limits,
> but doesn't say anything about performance. I think we should ask them.

We did... Oooh, but that was for writing partial records (see Keith
Bostic's mail in the dev archives).

> My gut feeling is accessing the data in the database should in general
> be faster than your common or garden filesystem, and that supporting
> external storage is more important for administrators than for users.

Yeah, let's not go into it now.

> We'll definitely have to think about this if we want to support large
> (i.e., several GB + years of history) repositories, distributed over
> several filesystems. Every repository-like database I've seen does
> things this way.
>
> We'll also have to invent a way to expunge old revisions and metadata
> from the repository to secondary storage, or for archiving; but the two
> aren't strictly related.
>
> In any case, our schema is flexible enough that we can implement
> out-of-DB storage on top of it (e.g., by extending the semantics of
> string keys to include external references). But personally I'd like to
> keep the option of having the whole thing in a single database, because
> it's much handier for smaller repositories.

I don't see how it's handier, actually, but don't mean to start an
argument by that (just not seeing your meaning, is all). But agree we
shouldn't bother with it right now.

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:33 2006

This is an archived mail posted to the Subversion Dev mailing list.