"Hyrum K. Wright" <hyrum_wright_at_mail.utexas.edu> writes:
> I've been poking around on the fs-rep-sharing branch lately to try and
> determine how we can avoid circular deltas when reusing
> representations. For those unfamiliar with the problem, here's a brief
> synopsis. (Apologies if this sounds like a core dump from my brain...)
>
> The purpose of fs-rep-sharing is to eliminate storing duplicate
> representations in the fs. Instead of just writing a representation
> to the database, we first check a cache of
> checksum->representation-key mappings, and see if we can reuse an
> existing representation. This is currently implemented for Berkeley
> DB.
>
> Because the representations themselves are mutable in BDB (i.e., the
> same representation can be full-text, and later rewritten as a delta
> against another full text), a problem arises where we can get circular
> delta references. Consider the following:
Sanity check: are you using the word "representation" for what the
repository (at least BDB) code calls "strings"? Because there's
something else in there called "representations", but they're not
deltified -- they're just a level of indirection between the filesystem
and the file contents.
I'll assume you're talking about "strings" from here on.
> +-----+
> | |
> | |
> | |
> +-----+
> A_1
>
> A_1 is the first representation of the object, so we store the full text.
>
> +-----+ +-----+
> | | delta | |
> | | ----> | |
> | | | |
> +-----+ +-----+
> A_1 A_2
>
> A_2 is a subsequent representation of the object, so we store its full
> text, and store A_1 as a delta against it. Now, imagine we revert back
> to the previous text in a later revision. Without rep-sharing we get:
>
> +-----+ +-----+ +-----+
> | | delta | | delta | |
> | | ----> | | ----> | |
> | | | | | |
> +-----+ +-----+ +-----+
> A_1 A_2 A_3
>
> A_3 is now stored full text and A_2 is a delta against A_3, with A_1
> being unchanged. However, with rep-sharing, we get:
>
> +-------------------------+
> v |
> +-----+ +-----+ |
> | | delta | | delta +-+
> | | ----> | | ----> |o|
> | | | | +-+
> +-----+ +-----+
> A_1 A_2 A_3
>
> Where both A_1 and A_2 remain unchanged, and A_3, instead of being a new
> representation, just refers to the previously written identical
> representation, A_1. This turns out to be a problem, because A_3 is
> really just A_1, which is a delta against A_2, which is a delta against
> A_1, which is a delta against A_2, ad nauseam.
>
> See the problem?
Well, no, wait. Under a sharing scheme, if when you add a string you
discover it has the same (fulltext) checksum as some existing string,
then no new deltification happens -- instead, the new string is just a
reference to the existing string. No cycle danger here.
In other words, both A1 and A3 would be deltas against A2.
I didn't read the rest of the mail closely, because it seemed to be
solving a problem whose existence I'm denying here :-).
-Karl
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-03-27 20:40:55 CET