Re: Whither apply_textdelta

From: Greg Stein <gstein_at_lyra.org>
Date: 2003-02-13 04:27:09 CET

[ (was Re: SVNSERVE Tests Failing) ... no wonder I missed it :-) ]

On Tue, Feb 11, 2003 at 05:51:05PM -0500, Greg Hudson wrote:
>...
> P1: Neon doesn't have a push interface for request bodies, so right now
> ra_dav buffers text deltas on commits. There's a legitimate question of
> whether solving this problem is allowing the tail to wag the dog.
>
> P2: We send self-compressed deltas on checkout and import. This saves
> bandwidth but takes time, and there might be more efficient ways to save
> time and space using commodity compression code (in the ra layers).
> More measurement is required to see if we should stop self-compressing,
> but I'll assume for the sake of argument that we want to send full texts
> for imports and checkouts.
>
> P3: We use deltas when it makes no sense: over ra_local and over
> ra_svn/ra_dav connections to loopback. The editor is in the best
> position to notice these cases, since editor implementations live inside
> ra_lib and driver implementations do not.
>
> P4: We use deltas when it makes very little sense: over fast LANs. We
> shouldn't try to guess when bandwidth is cheaper than CPU time, but in
> an ideal world the user might be able to make that decision in
> .subversion/servers or on the command line.

  P5: The editor is in the best position to determine the format of the
  delta output (e.g. svndiff vs. unidiff) based on requirements or
  environmental/cmdline switches or whatever.
  (this consideration came up with the XML editor stuff; it is preferable to
   insert "plain text" diffs rather than base64-encoded svndiffs)

  P6: Developer-only consideration: we always send deltas, which precludes
  the option of fulltexts during development of new subsystems.
  (not sure if this is a problem today, but this was one of my original
   motivators)

[ and yes, I agree with P1..P4 ]

> Here are the solutions people have proposed, and how they solve the
> above problems:

Agreed, except for the following points:

> S1: Leave apply_textdelta alone.
>...
> S1P2: Instead of self-compressing when no base is available, fake up
> delta windows which just say "insert this blob of new text." This
> is just as efficient as any other proposed scheme for sending full
> texts.

Here, the checkout and import drivers are assuming that fulltext is always
the right thing to do. The editors may decide that self-compressing is the
right thing (think 56k modem). This is sort of a converse of P2..P4 where
the problem is that we want to *avoid* deltas. In certain cases, we also
*want* the deltas. As you point out, typically, the [RA] editor is in the
best position to know the requirements.

> S1P3: Teach the drivers to send full texts (using faked-up delta
> windows) according to a flag. Add a new function to ra_lib to ask
> whether deltification is desirable, which can be by libsvn_client to
> inform svn_wc_adm_crawl. On the server side, svnserve or
> mod_dav_svn can provide the flag to libsvn_repos.

Right.

> S1P4: Leverage S1P3; simply pass the full text flag to drivers if
> configuration says it's appropriate.

Probably a tri-state: always-full-text, never-full-text, default.

S1P5: no solution? This would be a transformation from windows to a
full text to a GNU diff somehow.

S1P6: leverage S1P3/S1P4; the developer just munges the flag to ensure
that only fulltexts are used.

> S1 variant #1: Keep apply_textdelta but also add apply_fulltext. This
> would let S1P[2-4] be more elegant, though not more efficient.

Is the implication here, that all editors support both code paths, and the
driver chooses?

> S1 variant #2: Modify apply_textdelta so that it can return a flag
> saying "you can send me deltas if you want, but I think bandwidth is
> cheap; you should send me full text instead for better performance."
> Eliminates the need to pass flags to drivers to solve P3 (though that
> mostly gives up on P4). Easiest if we don't use variant #1.

I'm not sure about the "gives up on P4". Couldn't the editor read the
necessary config to decide what flag value to return?

> S1 variant #3: Invert apply_textdelta without worrying about full
> texts. (By "invert" I mean that the driver provides a callback which
> allows apply_textdelta to gather delta windows, instead of the editor
> providing a callback which the driver pushes delta windows onto.)
> Replace or agument svn_txdelta_to_svndiff with a function that pushes in
> the opposite direction. Solves P1, doesn't affect P[2-4]. Compatible
> with variant #1 or #2. Inconsistent with the rest of the editor
> interface, which uses a push model.

The notion of inconsistency is a bit of a red herring, I think. There are no
other streams in the editor interface. We "push" values at the editor in the
form of arguments, and you could also say that we push streams at the
editor. Sure, the editor happens to read from the stream, so it can be
called "pulling" the data, but that isn't inconsistent with anything else in
the editor.

Except for that line, I agree with the commentary.

> S2: Replace apply_textdelta with apply_text([base stream], target
> stream).

Note the optional base.

> The ra_svn receiving editor uses a callback to get at base
> texts from the FS or WC as appropriate. ra_dav uses a callback to get
> at base texts from the WC for update/switch/diff operations

I disagree with this part of the solution. Since the base is optional, the
driver simply passes NULL to the editor. It is assumed that the editor (the
FS or WC) has the base. In this case, yes: it does. The original design also
stated that the editor is free to raise an error if a base is not provided,
it requires one, and it cannot find the base through other means.

Thus, callbacks are not required.

> (no, the
> pain is not limited to ra_svn, you just won't notice it as much inside
> ra_dav because of the background pain level).

I expect that the "background pain" you refer to will diminish. Per previous
emails, I think that is mostly due to slowly building insanity in ra_dav
rather than anything truly central.

> Inconsistent with the
> rest of the editor interface which uses a push model.

Ahem. :-)

> S2P1: As with S1 variant #3, svn_txdelta_to_svndiff will need to be
> agumented or replaced with a function that pushes in the opposite
> direction. After that is done, it is easy enough to make ra_dav's
> apply_text supply svndiff data on demand from Neon.

The core functionality is a read-stream which serializes windows into the
svndiff format. The stream would be initialized with an svn_txdelta_stream_t
for reading the windows. Presumably, some kind of internal buffering would
happen for unread, serialized windows. The logic is something like:

  def svndiff_formatter_read(self, buf, buflen):
    if not self.buffer:
      window = svn_txdelta_next_window(self.txdelta)
      self.buffer = svndiff_serialize(window)
    result = self.buffer[:buflen]
    self.buffer = self.buffer[buflen:]
    return result

ra_dav can then read from the formatter svn_stream to satisfy Neon's
callbacks for more data.

> S2P2: For checkout and import, pass NULL for the base stream. The
> editor can still self-compress if it wants to, or it can send a full
> text.
>
> S2P3: The editor can choose to ignore the base stream and send a
> full text if it believes bandwidth to be cheap.
>
> S2P4: Editors could be passed a flag at creation time to use full
> text instead of deltas, in which case they would ignore the bsae
> stream.

Agreed on all three. I would also augment S2P4 with the fact that an editor
can figure out the flag by itself, if it has access to the right bits.

S2P5: the editor generates the appropriate output from the base and
target, based on whatever configuration/setting.

S2P6: some #ifdef in the editor controls its behavior

> S2 variant #1: Replace apply_textdelta with apply_text(target stream);
> the base text becomes implicit. ra_dav uses a callback to get at base
> texts from the WC for commit as well as update/switch/diff operations.
> S2P2 changes a little bit, in that the callback will say "there is no
> base text" and then the editor decide whether to self-compress or send
> full text.

ra_dav expects a callback to get base texts for when it retrieves files.
This could leverage that callback. However, I prefer an editor interface
that has the base stream explicit so that these callbacks are not required.

Hmm. I keep thinking that apply_text should have a mime type for the target
stream to indicate that the stream is a diff rather than a fulltext.

Oh: the guys convinced me that "heck, the driver should just rebuild the
fulltext and pass that in [along with the base] rather than complicating the
interface with a possibly-diff/possibly-full stream argument." And now we
see the results of that: the driver may have to reach into some other system
to fetch the base, when it doesn't necessarily have it. Thus, it is best to
allow for diffs to flow through the system. If somebody *wants* a full text,
then assume they have access to the base.
[ and that assumption holds water in all of our cases right now ]

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Thu Feb 13 04:23:34 2003

This message: [ Message body ]
Next message: Greg Hudson: "Re: Whither apply_textdelta"
Previous message: mark benedetto king: "Re: svn commit: rev 4775 - in branches/issue-1037-uuids/subversion: include libsvn_repos"
In reply to: Greg Hudson: "Whither apply_textdelta (was Re: SVNSERVE Tests Failing)"
Next in thread: Greg Hudson: "Re: Whither apply_textdelta"
Reply: Greg Hudson: "Re: Whither apply_textdelta"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]