[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1136294 - in /subversion/trunk/subversion/libsvn_ra_serf: ra_serf.h util.c

From: Lieven Govaerts <svnlgo_at_mobsol.be>
Date: Sun, 19 Jun 2011 10:59:42 +0200

On Thu, Jun 16, 2011 at 10:19 PM, Greg Stein <gstein_at_gmail.com> wrote:
> On Thu, Jun 16, 2011 at 02:38, Lieven Govaerts <svnlgo_at_mobsol.be> wrote:
>>...
>>> (here for review; more code to follow...)
>>
>> Interesting patch.
>
> Was hoping you'd have time to look at it. Thanks!
>
>>...
>>> +  /* If "pausing" is to be allowed, then this must refer to the connection's
>>> +     bucket allocator. It will be used to hold content in memory, then
>>> +     return it once it has been consumed.
>>> +
>>> +     NOTE: if this field is NULL, then pausing is NOT allowed.  */
>>> +  serf_bucket_alloc_t *pause_alloc;
>>> +
>>> +  /* If pausing is allowed, then callbacks can set this value to pause
>>> +     the XML parsing. Note that additional elements may be parsed once
>>> +     this value is set (as the current buffer is consumed; no further
>>> +     buffers will be parsed until PAUSED is cleared).
>>> +
>>> +     At some point, the callbacks should clear this value. The main
>>> +     serf_context_run() loop will notice the change and resume delivery
>>> +     of content to the XML parser.  */
>>> +  svn_boolean_t paused;
>>
>> If delivery is resumed from say disk, it might be 'a while' before all
>> cached data is consumed and svn is reading from the socket again. To
>> avoid dropped connections, we might design it so that even while the
>> XML parser is not paused, svn still continues to read from the socket
>> and adds the data to the memory/disk cache.
>
> Ah. Good point.
>
> I suspect that we'll generally be okay because the parser will paused
> once N requests get queued. And reaching N could be below the server
> timeout.
>
> I was certainly planning to consume from memory and disk, and append
> if more stuff arrived.
>
> Boy, it would be nice to have multiple threads :-)
>
>>> +  /* While the XML parser is paused, content arriving from the server
>>> +     must be saved locally. We cannot stop reading, or the server may
>>> +     decide to drop the connection. The content will be stored in memory
>>> +     up to a certain limit, and will then be spilled over to disk.
>>> +
>> Yes, this is the idea. I have been thinking of putting this
>> functionality in a bucket, e.g. a cache_bucket. Besides the callback
>> it fits the bucket model quite well.
>>
>> The benefits of such a bucket are improved locality of the code, but
>> also that you can put this cache_bucket in the chain before the
>> deflate_bucket, so that serf can stored the incoming data compressed
>> into memory, possibly not even requiring disk access.
>
> I saw this from your patch, but didn't think that it made sense.

Of course it does! :-)

In the absence of threads, it is the application's responsibility
during handle_response to read regularly from the response bucket serf
provides, to avoid timing out the socket (this should be added to the
serf_response_handler_t docu btw). The application also needs to
consume the data that it reads, or at least store it somewhere for
later usage. This is what you're implementing now. I think such a
functionality would fit in serf itself, because a. other applications
can have the same issues and b. serf can do this more efficiently due
to access to the uncompressed data.

Also...

> We need to read from the network and put that "somewhere" while the
> parser is paused. But what that means is you have a read/write bucket,
> which I disfavor. To draw a picture:
>
> read <- C <- N
>
> where C is your Cache bucket, and N is the Network bucket. But when
> the parser is paused, you have:
>
> C <- write <- read <- N
>

Hm, when the parser is paused, the cache bucket would behave like this:

svn <----read----- C <----read---- N
 | |
 --can't consume---

Where svn can tell serf that while it will still continue to read, it
can't consume the data. The cache bucket will 'grow bigger and bigger'
as it continues to read from the network socket and stores the data
locally, until the application can consume the response again.

> That just doesn't look like a typical stack of buckets. I think the

I'd call the cache bucket an overflow dam, which can be used to
prevent flooding downstream. :)

> ideal bucket scenario is "build them once, then read-only." A bucket
> can certainly generate "more" data internally when a read request
> occurs. That is just dynamically fulfilling its logical content. But
> altering the content represented over time feels icky.

The content is unaltered, but delayed over the time axis indeed.

So you also dislike the stream buckets (= aggregate bucket + onhold)
and the snapshot API, as they basically provide the same thing?
(rhetorical question btw, I know you dislike them).

>
> Thus, I set up the pending_t structure. That structure would be quite
> useful in other areas of our codebase, but (until needed) I made it
> local to ra_serf for now.
>
>>...
>>> +/* As chunks of content arrive from the server, and we need to hold them
>>> +   in memory (because the XML parser is paused), they are copied into
>>> +   these buffers. The buffers are arranged into a linked list.  */
>>> +struct pending_buffer_t {
>>> +  apr_size_t size;
>>> +  char data[PARSE_CHUNK_SIZE];
>>> +  struct pending_memnode_t *next;
>>> +};
>>
>> Why not store this in an aggregate bucket? It has to be wrapped in a
>> bucket anyway to return it to the caller. You can still calculate the
>> size of the current buffer outside the aggregate bucket.
>
> I don't need the content in bucket form. My plan is to "read" from
> pending_t and call inject_to_parser(). This will likely be performed
> in the code block that calls serf_context_run(). The response handler
> for the network will append to the pending structures while the parser
> is paused or when pending data exists. When the parser is active, the
> response handler will inject its content (as the code is structured
> today). IOW, the response handler will not read from the pending data.
>
> Does the above sound reasonable?

While I like the general idea here, I'm unsure about the impact of
making a response handler that doesn't actually parses and uses the
complete response, but only stores it. What happens for instance when
the complete response was received, but the parsing failed?

I'll have a look at the rest of your changes to see how it all binds together.

Lieven

>
> Cheers,
> -g
>
Received on 2011-06-19 11:00:36 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.