Re: hmm. just realized something

From: Greg Stein <gstein_at_lyra.org>
Date: 2000-08-15 10:03:44 CEST

On Mon, Aug 14, 2000 at 10:14:11PM -0500, Karl Fogel wrote:
> Greg Stein <gstein@lyra.org> writes:
> > I just realized that the whole XML marshalling of the deltas/skeltas is kind
> > of a moot issue. That isn't the form that will be used between the client
> > and the server (a delta/skelta will be broken down into a linear sequence of
> > DAV operations).
>
> Right.
>
> By the way, skeltas may be extinct.

Not a problem. They introduce an extra step in the DAV design that isn't
strictly needed. Punting them shouldn't be a problem.

> Jim pointed out that they violate
> the rule that one user's net connection being flaky should never make
> other users wait. But if someone's skelta gets approved into the
> pending pool, and then their connection flakes out before their whole
> delta is transmitted, this can affect other people's commits.

Well... making the skelta perform any kind of a state change is the basic
issue. If it is viewed simply as a preflight test, then you're okay. The
*real* change comes later and (hopefully) succeeds. (but the race condition
between skelta-OK and delta-apply could let in other changes and thus the
delta could still fail)

> The
> problem can be alleviated somewhat with a timeout, but it's still a
> pretty good principle to avoid violating. Further persuasion is that
> it's a _lot_ simpler to implement Subversion with single-transmission
> commits first, and then add the skelta-then-delta method iff it's
> clearly needed, the latter functionality being just a superset of the
> former.

The skelta is nice from the standpoint of finding basic problems before
transmitting that 10M file over the 56k uplink.

However, if the tree format is tweaked/munged/tossed, then the large file
uploads could come at the end of the change set. This would allow a failure
to abort the change set before bothering with the file uploads.

If you keep the tree structure, then I'd recommend keeping the skelta
concept.

> > I'm not exactly sure what kind of impact that has, but it seems rather
> > large.
> >
> > Ideally, what will happen:
> >
> > *) the client constructs a delta/skelta data structure
> > *) that structure is passed to the client network library
>
> Remember that it is constructed and passed streamily, not all at once,
> too.

Shouldn't be an issue. It appears that the tree structure of a delta can be
streamily transformed into a proper request sequence.

> > *) the network library marshals it across the network in whatever form makes
> > the most sense
> > *) the server constructs a delta/skelta data structure from the marshalled
> > form
>
> Again, it has to construct it and pass it down on the fly -- no
> marshalling (unless you're using that term to means something else).

"marshal" in the generic sense of transforming a memory structure to a
stream format for storage/transmission.

The request/response nature of HTTP monkeys up the construction of a stream
for the complete change set.

*ponder*

I see three possibilities here:

1) As each file arrives during the delta transmission, it is placed into a
   temporary file on the server. When the entire delta has arrived and a
   stream is constructed for passing into Subversion, the file contents are
   spooled out of the temporary files and into the delta stream.

This has the obvious effect of a double copy of the file contents. It is
copied to disk, then copied into the SVN repository.

2) As the file arrives, it is stashed directly into the SVN repository. When
the delta stream is constructed, it simply refers to the files that are
already present in the SVN repository.

3) Punt the decomposition of a tree into a set of change requests.

My favorite is #2. I know that we had already planned on allowing files to
be stored into the repository independent of an actual change set being
applied (to allow the repository to avoid locks during the long spool time
of uploading file contents). The #2 option is simply a bit more explicit
about that operation.

> > *) the data structure is passed to the SVN server library
> >
> > The XML marshal format is only useful in a non-DAV context. By hard-wiring
> > the XML format/concept into the code, it seems that we are also making a
> > number of decisions about how the network layers will work.
>
> Hmmm.
>
> There is the decision that no one will ever have to hold an entire
> delta in memory at once, which implies a streamable representation.
> That seems like it should be true no matter what.

I've been working under the assumption that mod_dav_svn will construct a DAV
activity on the server. The activity would effectively be a set of change
records stored to disk (for persistence between each HTTP request). When the
MERGE arrives, the activity is pulled off of the disk and passed into SVN.

So yes: it certainly won't be in memory.

Also, I agree: A streamable representation is good, and allows us to avoid
issues with unbounded commit sizes.

> And there is the decision that Subversion's internal streamable
> representation of a delta reflects the hierarchy of the tree
> structure(s) that the delta is changing.
>
> It's the latter that's causing a problem for DAV, right?

The tree structure and a linear sequence of changes are functionally
equivalent. I'm not having any problems (with the translation or otherwise).

My original point is that the XML format that is specified in the design
document, and the parser work that is being done is possibly moot. (sorry
Ben!) There just isn't a point in the sequence where the XML format is
required.

> I think we're encountering the technical realities of an essentially
> political decision. :-)

Well, I wouldn't call it political. A few too many connotations there :-)

I'd say the choice of using a tree vs a sequence is having an effect. But
that isn't posing any problems for me since they are equivalent (through
some well-defined transformations).

Hmm. Okay... I guess I'd rewrite my email as "the XML form is moot. my
second point is that the ideal situation is that the client network library
has change-items pushed into it. it marshals and transmits those. the server
then pushes the change-items into the server-side SVN library."

Part of this ideal also arises out of the recognition that the tree and
linear forms are equivalent. Consider the client side:

while walking_working_copy():
generate_change_XML()

Each of those generate_change_XML() calls can be viewed as an "event." Sure,
the sequence has a bunch of element-start and element-end markers which
create a tree or a scoping, but the fact is that you have a sequence of
events. A simple example:

<replace name="file1">
<file>text-delta</file>
</replace>

is equal to the following event sequence:

START: replace
ATTR: name=file1
START: file
CDATA: text-delta
END: file
END: replace

The "digger" thing in the current code is all about transforming an XML tree
into a meaningful sequence of events. Each of those callbacks is an event:
delete, entry_pdelta, add_directory, replace_directory, etc.

Imagine if the the server simply called those callbacks directly. Why use a
tree structure? As long as the sequence of calls was identical. The above
example issues a replace_file() call. Does it matter whether replace_file()
was called during a tree structure walk or via a sequence of calls?

> On that note, it's high time for me to get some sleep, yikes.
>
> If by "heads down" you meant "encountering big obstacles", maybe try
> describing the problems to the list?

No problems. Just simplifications and a reduction in the amount of code that
needs to be implemented. e.g. why deal with XML parser code if it isn't
going to be used? You know me: "pragmatic" is my favorite word. I love to
avoid writing code :-)

"heads down" meant busy doing a brain dump into the document. I've spent a
good while reviewing docs and thinking/planning/designing. It is simply
taking me a bit longer than I had envisioned to get it all onto "paper."

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Received on Sat Oct 21 14:36:06 2006

This message: [ Message body ]
Next message: Zack Weinberg: "Robustness and invisible metadata"
Previous message: Karl Fogel: "Re: hmm. just realized something"
In reply to: Karl Fogel: "Re: hmm. just realized something"
Next in thread: Karl Fogel: "Re: hmm. just realized something"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]