[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: patch API review

From: Hyrum K. Wright <hyrum_wright_at_mail.utexas.edu>
Date: Wed, 28 Apr 2010 14:20:18 -0500

On Wed, Apr 28, 2010 at 1:43 PM, Stefan Sperling <stsp_at_elego.de> wrote:

> On Wed, Apr 28, 2010 at 10:05:58AM -0500, Hyrum K. Wright wrote:
> > I'm looking at adding support for the patch client API to JavaHL, but in
> > doing so, I have a few questions about the API itself. I'm pretty late
> the
> > game, so these questions might already be answered. If so, please point
> me
> > to the relevant thread(s).
> >
> > For reference, here's the public API:
> >
> > svn_error_t *
> > svn_client_patch(const char *abs_patch_path,
> > const char *local_abspath,
> > svn_boolean_t dry_run,
> > int strip_count,
> > svn_boolean_t reverse,
> > const apr_array_header_t *include_patterns,
> > const apr_array_header_t *exclude_patterns,
> > apr_hash_t **patched_tempfiles,
> > apr_hash_t **reject_tempfiles,
> > svn_boolean_t ignore_whitespaces,
> > svn_client_ctx_t *ctx,
> > apr_pool_t *result_pool,
> > apr_pool_t *scratch_pool);
> >
> > Firstly, the first parameter should probably be 'patch_abspath', which
> > follows convention with other variable names.
>
> Sure.
>
> > That being said, is there a
> > reason why the patch needs to be a file, instead of a stream? Using a
> > stream seems much more flexible for the long-term evolution of the API.
>
> Be prepared. Such a change will probably go deep into the diff parser.
>
> The current svn patch and diff parsing code is optimized around never
> allocating more memory than needed to process a single line.
> Otherwise svn patch would run out of memory quickly for large files.
> We currently support lines as long as memory can hold, rather than
> limiting the size of the entire file by memory.
>
> Furthermore the diff parsing code provides the patch application code
> with streams mapped to portions of the patch file via the
> svn_stream_from_aprfile_range_readonly() API.
> We could now do something similar via mark / seek support in streams.
> When I wrote the diff parser we did not yet have mark / seek support
> in streams yet.
>
> The mark / seek support is currently only available for streams backed
> by APR files or stringbufs. IOW, making the patch file a stream will
> not provide a lot of abstraction -- you'll either use a file anyway and
> wrap a stream around it, or load the entire patch into memory and hog
> tons of RAM for large patches (which is exactly what we want to avoid).
>
> While I would not object to making the patch file a stream if it doesn't
> hurt memory usage, I don't think it's worth re-implementing a lot of
> the svn patch and diff parsing code just to provide callers with the
> option of passing a stream. Why would they want to use a stream?
> I have no problem with requiring callers of this API to create a tempfile
> if they get the patch data from something other than a file. If callers
> pass a stream we have no idea how much memory we'll end up using.
> Granted, we could say that's the caller's problem, but even then I still
> don't see enough benefit to justify the effort involved in changing
> this so I won't do it myself.
>

This sounds like a very valid reason, so I won't touch anything here.

> > Despite my best efforts, I still haven't been able to fully grok the
> > patched_tempfiles and reject_tempfiles parameters. However, if they are
> > truly output parameters, they should be at the front of the parameter
> list,
> > not hidden in the middle somewhere.
>
> They are output parameters, so yes, we might want to move them.
> They're primarily for use by TortoiseSVN and similar 3rd party clients,
> not really a primary feature of the API. Hence it didn't look right to
> me to put them at the front. But I don't mind moving them.
>
> > Also, I know that the API uses
> > notifications, but it might also be useful to return the _tempfile
> > parameters through a callback mechanism, rather than as (arbitrarily
> large)
> > hashes.
>
> That would work as well, yes. Might look nicer than having the output
> parameters at the front.
>

I'll play around with this idea.

> > Anyway, those are my thoughts. I'd go ahead and make these changes
> myself,
> > but as I mentioned before, I'm not sure if there are Deep reasons for the
> > current API.
>
> Feel free to go ahead and make these changes. If you change the patch
> file to a stream, please try not to make 'svn patch' use more memory
> than it currently does

As noted, I won't change the file to a stream. That's out of my league. :)

-Hyrum
Received on 2010-04-28 21:20:57 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.