Re: svn commit: r37278 - trunk/subversion/libsvn_client

From: Greg Stein <gstein_at_gmail.com>
Date: Sat, 18 Apr 2009 00:43:11 +0200

Shweet... I don't have any specific line-by-line responses, and will
wait for your next revision. I'm happy to have pointed out spots of
concern, and see where ya go...

[and yes, it looks like a number of items I asked about are "in flux",
so no worries...]

On Fri, Apr 17, 2009 at 23:39, Stefan Sperling <stsp_at_elego.de> wrote:
> On Thu, Apr 16, 2009 at 02:29:02PM +0200, Greg Stein wrote:
>> On Wed, Apr 15, 2009 at 21:48, Stefan Sperling <stsp_at_elego.de> wrote:
>> >...
>> > +++ trunk/subversion/libsvn_client/patch.c Wed Apr 15 12:48:04 2009 (r37278)
>> >...
>> > @@ -1738,6 +1739,11 @@ extract_svnpatch(const char *original_pa
>> > return SVN_NO_ERROR;
>> > }
>> >
>> > +/* ### forward-declaration */
>> > +static svn_error_t *
>> > +apply_textdiffs(const char *patch_path, svn_client_ctx_t *ctx,
>> > + apr_pool_t *pool);
>>
>> Will this go away at some point? And could you put this at the TOP of
>> the file, rather than hidden in the middle of the code?
>
> It can only go away if we move the svn_client_patch() function to
> the end of the file.
>
> Right now, it's nice to have the svn-protocol-specific stuff at
> the top, and the unidiff-specific stuff at the bottom.
>
> I expect the svn-protocol-specific stuff to be changed a lot at some
> point, when we clean it up then.
>
>> >...
>> > +/* --- Text-diff application routines --- */
>> > +
>> > +typedef struct patch_target_t {
>>
>> Does the structure actually need a name? Since it is typedef'd, that
>> is kinda redundant.
>
> I'll remove it.
>
>> Could you also try to keep typedefs near the top of the file, rather
>> than intermixed with all the code?
>
> Well, same thing again. I'd like that struct to be close to functions
> that use it. I consider the unidiff stuff separate from the rest of
> that file. I'd even go as far as moving it into a file of its own.
> It evolved from debugging code I had added to libsvn_client/patch.c,
> that's the only reason it's there right now.
>
>> While I recognize that a file may have 2^63 bytes, files that large
>> are NOT going to be text files with more than an int's worth of line
>> numbers. "filesize" is a strange name for "line number".
>
> Yes. I was assuming that diff used line offsets until I got to writing
> code that applys diffs. The unidiff-format spec that I read didn't
> say that all numbers where in lines.
>
> I will convert those to unsigned long.
>
>> I will also note that svn's diff algorithms keep a small structure of
>> data *per line*. If we ever run into files with more than 2 billion
>> lines, then we're going to be in HUGE trouble.
>
> heh
>
>> (and no, I wouldn't be very opposed to Yet Another Typedef :-P ... say
>> svn_lineno_t, maybe).
>
> Might make the code more readable, yes. I'll add it to svn_types.h.
>
>> >>..
>> > + /* EOL-marker used by target file. */
>> > + const char *eol_str;
>> > +
>> > + /* EOL-marker used by patch file. */
>> > + const char *patch_eol_str;
>>
>> This structure describes the "target", so why is source information in here?
>
> This might go away eventually. I'm not entirely sure it's going to be
> needed. I will look at this again when implementing eol-style
> normalisation for patching.
>
>> >...
>> > +copy_lines_to_target(patch_target_t *target, svn_filesize_t line,
>> > + svn_boolean_t *eof, apr_pool_t *pool)
>> > +{
>> > + svn_stream_t *s;
>> > + apr_pool_t *iterpool;
>> > +
>> > + s = svn_stream_from_aprfile2(target->file, TRUE, pool);
>> > + *eof = FALSE;
>> > +
>> > + iterpool = svn_pool_create(pool);
>> > + while (target->current_line < line && ! *eof)
>> > + {
>> > + svn_stringbuf_t *buf;
>> > + apr_size_t len;
>> > +
>> > + svn_pool_clear(iterpool);
>> > +
>> > + SVN_ERR(svn_stream_readline(s, &buf, target->eol_str, eof, pool));
>> > + svn_stringbuf_appendcstr(buf, target->eol_str);
>> > + target->current_line++;
>> > +
>> > + len = buf->len;
>> > + SVN_ERR(svn_stream_write(target->result, buf->data, &len));
>> > + if (len < buf->len)
>>
>> By definition, a short write cannot occur without an error.
>
> Great, so I can drop that check.
>
>> >...
>> > + svn_stream_close(s);
>> > +
>> > + return SVN_NO_ERROR;
>>
>> Woah! Can't ignore the error on that stream_close. I'd suggest returning it.
>
> Ooops :)
> Sorry I didn't do that on purpose. Good catch.
>
>> > +/* Read at most NLINES from the TARGET file, returning lines read in *LINES,
>> > + * separated by the EOL sequence specified in TARGET->patch_eol_str.
>> > + * Allocate *LINES in result_pool.
>> > + * Use SCRATCH_POOL for all other allocations. */
>> > +static svn_error_t *
>> > +read_lines_from_target(svn_string_t **lines, svn_filesize_t nlines,
>> > + patch_target_t *target, apr_pool_t *scratch_pool,
>> > + apr_pool_t *result_pool)
>>
>> Please correct the ordering of result/scratch_pool.
>
> Already done (r37303).
>
>> And what is keeping NLINES from being 1,000,000 ? And reading an
>> ENTIRE file into memory? Unless there is a "just can't be done,
>> otherwise" reason (like the diff code), then we cannot read
>> filesize-proportionate contents into memory.
>
> Ooops!
>
> You're right, a hunk might be very big, or even an entire file when it
> is being added (all lines in the diff start with +).
>
> Good thing we have diff APIs for strings as well as files.
> I'll try to put hunks into temporary files instead in-memory strings
> if the number of bytes of the hunk is above an upper bound. We'll just
> need to define a threshold. Does 64k sound reasonable or is that not
> enough for you? :)
>
>> >...
>> > + svn_stream_close(s);
>>
>> Gotta return, or clear, that error return.
>
> Yes, will do.
>
>>
>> > +
>> > + target->current_line += i;
>> > + *lines = svn_string_create_from_buf(buf, result_pool);
>>
>> Why do you read the lines into iterpool, then copy them to
>> scratch_pool, then copy them *again* to result_pool?
>
> Because I applied our pool usage pragmas without thinking.
>
>> >...
>> > +/* Apply a HUNK to a patch TARGET. Do all allocations in POOL. */
>> > +static svn_error_t *
>> > +apply_one_hunk(svn_hunk_t *hunk, patch_target_t *target, apr_pool_t *pool)
>> > +{
>> > + svn_filesize_t line;
>> > + svn_boolean_t eof;
>> > + svn_string_t *latest_text;
>> > + svn_diff_t *diff;
>> > + svn_diff_file_options_t *opts;
>> > +
>> > + /* Determine the line the hunk should be applied at. */
>> > + line = determine_hunk_line(hunk, target);
>> > +
>> > + if (target->current_line > line)
>> > + /* If we already passed the line that the hunk should be applied to,
>> > + * the hunks in the patch file are out of order. */
>> > + /* TODO: Warn, create reject file? */
>> > + return SVN_NO_ERROR;
>>
>> I'll repeat a comment from a previous review: please use ### for to-do markers.
>
> If that helps you, I can add them.
>
>> Also: if there is more than one line after an "if", then please wrap
>> it in braces (yes, even tho it is a comment...)
>
> If that makes you happy I can do it.
>
> I might add that the above two points are not mentioned in HACKING
> so I'll consider them your personal preference and not the top-most
> priority tasks here -- you found bigger much problems I'd like to
> address first.
>
>> >...
>> > + SVN_ERR(svn_stream_close(target->result));
>> > + SVN_ERR(svn_io_file_close(target->file, pool));
>>
>> Do you ever seek on target->file? If not, then let the stream own the
>> file (and don't even bother to remember the apr_file_t).
>
> We might seek in it later. I'm not sure yet.
>
> Thanks,
> Stefan
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1776422
Received on 2009-04-18 00:43:35 CEST

This message: [ Message body ]
Next message: Greg Stein: "Re: svn commit: r37342 - trunk/build/ac-macros"
Previous message: Bert Huijben: "RE: patch_tests.py 4 on Windows"
In reply to: Stefan Sperling: "Re: svn commit: r37278 - trunk/subversion/libsvn_client"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]