Re: patch API review

From: Stefan Sperling <stsp_at_elego.de>
Date: Wed, 28 Apr 2010 20:43:58 +0200

On Wed, Apr 28, 2010 at 10:05:58AM -0500, Hyrum K. Wright wrote:
> I'm looking at adding support for the patch client API to JavaHL, but in
> doing so, I have a few questions about the API itself. I'm pretty late the
> game, so these questions might already be answered. If so, please point me
> to the relevant thread(s).
>
> For reference, here's the public API:
>
> svn_error_t *
> svn_client_patch(const char *abs_patch_path,
> const char *local_abspath,
> svn_boolean_t dry_run,
> int strip_count,
> svn_boolean_t reverse,
> const apr_array_header_t *include_patterns,
> const apr_array_header_t *exclude_patterns,
> apr_hash_t **patched_tempfiles,
> apr_hash_t **reject_tempfiles,
> svn_boolean_t ignore_whitespaces,
> svn_client_ctx_t *ctx,
> apr_pool_t *result_pool,
> apr_pool_t *scratch_pool);
>
> Firstly, the first parameter should probably be 'patch_abspath', which
> follows convention with other variable names.

Sure.

> That being said, is there a
> reason why the patch needs to be a file, instead of a stream? Using a
> stream seems much more flexible for the long-term evolution of the API.

Be prepared. Such a change will probably go deep into the diff parser.

The current svn patch and diff parsing code is optimized around never
allocating more memory than needed to process a single line.
Otherwise svn patch would run out of memory quickly for large files.
We currently support lines as long as memory can hold, rather than
limiting the size of the entire file by memory.

Furthermore the diff parsing code provides the patch application code
with streams mapped to portions of the patch file via the
svn_stream_from_aprfile_range_readonly() API.
We could now do something similar via mark / seek support in streams.
When I wrote the diff parser we did not yet have mark / seek support
in streams yet.

The mark / seek support is currently only available for streams backed
by APR files or stringbufs. IOW, making the patch file a stream will
not provide a lot of abstraction -- you'll either use a file anyway and
wrap a stream around it, or load the entire patch into memory and hog
tons of RAM for large patches (which is exactly what we want to avoid).

While I would not object to making the patch file a stream if it doesn't
hurt memory usage, I don't think it's worth re-implementing a lot of
the svn patch and diff parsing code just to provide callers with the
option of passing a stream. Why would they want to use a stream?
I have no problem with requiring callers of this API to create a tempfile
if they get the patch data from something other than a file. If callers
pass a stream we have no idea how much memory we'll end up using.
Granted, we could say that's the caller's problem, but even then I still
don't see enough benefit to justify the effort involved in changing
this so I won't do it myself.

> Despite my best efforts, I still haven't been able to fully grok the
> patched_tempfiles and reject_tempfiles parameters. However, if they are
> truly output parameters, they should be at the front of the parameter list,
> not hidden in the middle somewhere.

They are output parameters, so yes, we might want to move them.
They're primarily for use by TortoiseSVN and similar 3rd party clients,
not really a primary feature of the API. Hence it didn't look right to
me to put them at the front. But I don't mind moving them.

> Also, I know that the API uses
> notifications, but it might also be useful to return the _tempfile
> parameters through a callback mechanism, rather than as (arbitrarily large)
> hashes.

That would work as well, yes. Might look nicer than having the output
parameters at the front.

> Anyway, those are my thoughts. I'd go ahead and make these changes myself,
> but as I mentioned before, I'm not sure if there are Deep reasons for the
> current API.

Feel free to go ahead and make these changes. If you change the patch
file to a stream, please try not to make 'svn patch' use more memory
than it currently does.

Stefan
Received on 2010-04-28 20:46:24 CEST

This message: [ Message body ]
Next message: Hyrum K. Wright: "Re: patch API review"
Previous message: Bolstridge, Andrew: "RE: Feature idea: user-configurable post-commit notifications"
In reply to: Hyrum K. Wright: "patch API review"
Next in thread: Hyrum K. Wright: "Re: patch API review"
Reply: Hyrum K. Wright: "Re: patch API review"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]