Re: [merge tracking] Behavior of 'log' operations

From: Hyrum K. Wright <hyrum_wright_at_mail.utexas.edu>
Date: 2007-05-02 19:39:50 CEST

Daniel Rall wrote:
> On Wed, 02 May 2007, Hyrum K. Wright wrote:
>
>> Daniel Rall wrote:
>>> On Tue, 01 May 2007, hwright@tigris.org wrote:
>>> ...
>>>> --- trunk/www/merge-tracking/design.html (original)
>>>> +++ trunk/www/merge-tracking/design.html Tue May 1 12:11:20 2007
>>> ...
>>>> +The introduction of merge tracking changes that paradigm. Log messages
>>>> +for independent revisions are still linearly related as before, but log
>>>> +messages for merging revisions now have children. These children are log
>>>> +messages for the revisions which have been merged, and they may in turn
>>>> +also have children.
>>>> +
>>>> +The result is a tree structure which the repository layer builds as it
>>>> +collects log message information. This tree structure then gets serialized
>>>> +and marshaled back to the client, which can then rebuilt the tree if needed.
>>>> +Additionally, less information needs to be explicitly given, as the tree
>>>> +structure itself implies revision relationships.
>>>> +
>>>> +
>>>> +We currently use the <code>svn_log_message_receiver_t</code> interface
>>>> +to return log messages between application layers. To enable a tree
>>>> +structure, we add another parameter, <code>child_count</code>. When
>>>> +<code>child_count</code> is zero, the node is a leaf node, when
>>>> +<code>child_count</code> is greater than zero, the node is an interior node,
>>>> +with the given number of children. These children may also have children and
>>>> +indicate such by their own <code>child_count</code> parameters. Consumers of
>>>> +this API can be aware of the number of children and rebuild the tree, or pass
>>>> +the values farther up the application stack. In effect, this method implements
>>>> +a preorder traversal of the log message tree.
>>> When requested, I'd expect the API to return a "log info" data
>>> structure which has an "apr_array_header_t *children;" field (with
>>> elements of type "log info", fleshed out). With such a structure, the
>>> nelts field of the children's container could be used in lieu of a new
>>> child_count field.
>> Would this affect the streamy-ness of the API? We would have to wait
>> until all the children have been fetched (a potentially long operation)
>> before returning any of them. Using the child_count scheme, it seems
>> like a client could reconstruct the tree, if it needed the data in tree
>> form.
>>
>> In the case of our command line client, we don't need need a tree. We
>> can output the messages as they are received, keeping track of the
>> "Result of merge" values in a stack in the receiver baton.
>
> Wouldn't this would require opening multiple RA sessions to the
> repository? (Meaning multiple TCP sockets, in the usual case.)

Nope. We send multiple log messages serially down one RA session now,
don't we?

This scheme requires that the children be sent in-band, right after the
parents. That wouldn't require any additional connections than the one
that we are currently using.

Maybe an example here might help:
Assume a simplified (x, y) tuple to represent the log message for
revision x with child_count y. Using the last example in the functional
spec, the series of messages would be:
(24, 2) (14, 0) (12, 2) (10, 0) (9, 0)

This unambiguously defines the tree, which can be rebuilt by clients
that need it. Our client doesn't, and can just spit the messages out in
the order it gets them.

>> If it is a case of convenience, we could provide a receiver function for
>> clients that builds the log message tree.
>
> The command-line client will always need the data in tree form when
> invoked with the --merge-sensitive option (which I'm assuming will be
> a new boolean on the RA log API). In terms of network I/O, it'd be
> less efficient to require O(N) RA calls than a single call which
> fetches all the data. Yes, the data will come down in a potentially
> "jerky" fashion (for deeply nested trees, which won't be the usual
> case), but will remain streamy for log info with no nesting. I'm not
> sure what this means in terms of CPU usage and disk I/O on the
> repository-side; we may need to profile to determine the best
> strategy.

Why will the command line client always need the data in tree form? The
end result is output to the terminal, which is serialized, and basically
a preorder traversal of the message tree. If we serialize the messages
through the ra session in the same way, why do we need to construct and
then traverse the tree at the client? We should be able to just spit
out the messages as they come, with appropriate merge tracking
information pulled from the baton.

> In the case where we know we'll want the extra log data, we should
> request it up front.
>
> In the case where we don't need the data, we won't request it. The
> func spec doesn't say anything about showing merge tracking info for
> this case -- I assume we want to stick with that?

Yes. The server should only return merge tracking data when requested.

Am I making much sense?

-Hyrum

application/pgp-signature attachment: OpenPGP digital signature

Received on Wed May 2 19:40:12 2007

This message: [ Message body ]
Next message: Daniel Rall: "Re: trunk build broken?"
Previous message: David James: "Re: svn commit: r24497 - trunk/subversion/bindings/swig"
In reply to: Daniel Rall: "[merge tracking] Behavior of 'log' operations"
Next in thread: Daniel Rall: "Re: [merge tracking] Behavior of 'log' operations"
Reply: Daniel Rall: "Re: [merge tracking] Behavior of 'log' operations"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]