[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r37278 - trunk/subversion/libsvn_client

From: Stefan Sperling <stsp_at_elego.de>
Date: Fri, 17 Apr 2009 22:39:41 +0100

On Thu, Apr 16, 2009 at 02:29:02PM +0200, Greg Stein wrote:
> On Wed, Apr 15, 2009 at 21:48, Stefan Sperling <stsp_at_elego.de> wrote:
> >...
> > +++ trunk/subversion/libsvn_client/patch.c      Wed Apr 15 12:48:04 2009        (r37278)
> >...
> > @@ -1738,6 +1739,11 @@ extract_svnpatch(const char *original_pa
> >   return SVN_NO_ERROR;
> >  }
> >
> > +/* ### forward-declaration */
> > +static svn_error_t *
> > +apply_textdiffs(const char *patch_path, svn_client_ctx_t *ctx,
> > +                apr_pool_t *pool);
>
> Will this go away at some point? And could you put this at the TOP of
> the file, rather than hidden in the middle of the code?

It can only go away if we move the svn_client_patch() function to
the end of the file.

Right now, it's nice to have the svn-protocol-specific stuff at
the top, and the unidiff-specific stuff at the bottom.

I expect the svn-protocol-specific stuff to be changed a lot at some
point, when we clean it up then.

> >...
> > +/* --- Text-diff application routines --- */
> > +
> > +typedef struct patch_target_t {
>
> Does the structure actually need a name? Since it is typedef'd, that
> is kinda redundant.

I'll remove it.

> Could you also try to keep typedefs near the top of the file, rather
> than intermixed with all the code?

Well, same thing again. I'd like that struct to be close to functions
that use it. I consider the unidiff stuff separate from the rest of
that file. I'd even go as far as moving it into a file of its own.
It evolved from debugging code I had added to libsvn_client/patch.c,
that's the only reason it's there right now.
 
> While I recognize that a file may have 2^63 bytes, files that large
> are NOT going to be text files with more than an int's worth of line
> numbers. "filesize" is a strange name for "line number".

Yes. I was assuming that diff used line offsets until I got to writing
code that applys diffs. The unidiff-format spec that I read didn't
say that all numbers where in lines.

I will convert those to unsigned long.

> I will also note that svn's diff algorithms keep a small structure of
> data *per line*. If we ever run into files with more than 2 billion
> lines, then we're going to be in HUGE trouble.

heh

> (and no, I wouldn't be very opposed to Yet Another Typedef :-P ... say
> svn_lineno_t, maybe).

Might make the code more readable, yes. I'll add it to svn_types.h.

> >>..
> > +  /* EOL-marker used by target file. */
> > +  const char *eol_str;
> > +
> > +  /* EOL-marker used by patch file. */
> > +  const char *patch_eol_str;
>
> This structure describes the "target", so why is source information in here?

This might go away eventually. I'm not entirely sure it's going to be
needed. I will look at this again when implementing eol-style
normalisation for patching.
 
> >...
> > +copy_lines_to_target(patch_target_t *target, svn_filesize_t line,
> > +                     svn_boolean_t *eof, apr_pool_t *pool)
> > +{
> > +  svn_stream_t *s;
> > +  apr_pool_t *iterpool;
> > +
> > +  s = svn_stream_from_aprfile2(target->file, TRUE, pool);
> > +  *eof = FALSE;
> > +
> > +  iterpool = svn_pool_create(pool);
> > +  while (target->current_line < line && ! *eof)
> > +    {
> > +      svn_stringbuf_t *buf;
> > +      apr_size_t len;
> > +
> > +      svn_pool_clear(iterpool);
> > +
> > +      SVN_ERR(svn_stream_readline(s, &buf, target->eol_str, eof, pool));
> > +      svn_stringbuf_appendcstr(buf, target->eol_str);
> > +      target->current_line++;
> > +
> > +      len = buf->len;
> > +      SVN_ERR(svn_stream_write(target->result, buf->data, &len));
> > +      if (len < buf->len)
>
> By definition, a short write cannot occur without an error.

Great, so I can drop that check.

> >...
> > +  svn_stream_close(s);
> > +
> > +  return SVN_NO_ERROR;
>
> Woah! Can't ignore the error on that stream_close. I'd suggest returning it.

Ooops :)
Sorry I didn't do that on purpose. Good catch.

> > +/* Read at most NLINES from the TARGET file, returning lines read in *LINES,
> > + * separated by the EOL sequence specified in TARGET->patch_eol_str.
> > + * Allocate *LINES in result_pool.
> > + * Use SCRATCH_POOL for all other allocations. */
> > +static svn_error_t *
> > +read_lines_from_target(svn_string_t **lines, svn_filesize_t nlines,
> > +                       patch_target_t *target, apr_pool_t *scratch_pool,
> > +                       apr_pool_t *result_pool)
>
> Please correct the ordering of result/scratch_pool.

Already done (r37303).

> And what is keeping NLINES from being 1,000,000 ? And reading an
> ENTIRE file into memory? Unless there is a "just can't be done,
> otherwise" reason (like the diff code), then we cannot read
> filesize-proportionate contents into memory.

Ooops!

You're right, a hunk might be very big, or even an entire file when it
is being added (all lines in the diff start with +).

Good thing we have diff APIs for strings as well as files.
I'll try to put hunks into temporary files instead in-memory strings
if the number of bytes of the hunk is above an upper bound. We'll just
need to define a threshold. Does 64k sound reasonable or is that not
enough for you? :)
 
> >...
> > +  svn_stream_close(s);
>
> Gotta return, or clear, that error return.

Yes, will do.

>
> > +
> > +  target->current_line += i;
> > +  *lines = svn_string_create_from_buf(buf, result_pool);
>
> Why do you read the lines into iterpool, then copy them to
> scratch_pool, then copy them *again* to result_pool?

Because I applied our pool usage pragmas without thinking.

> >...
> > +/* Apply a HUNK to a patch TARGET. Do all allocations in POOL. */
> > +static svn_error_t *
> > +apply_one_hunk(svn_hunk_t *hunk, patch_target_t *target, apr_pool_t *pool)
> > +{
> > +  svn_filesize_t line;
> > +  svn_boolean_t eof;
> > +  svn_string_t *latest_text;
> > +  svn_diff_t *diff;
> > +  svn_diff_file_options_t *opts;
> > +
> > +  /* Determine the line the hunk should be applied at. */
> > +  line = determine_hunk_line(hunk, target);
> > +
> > +  if (target->current_line > line)
> > +    /* If we already passed the line that the hunk should be applied to,
> > +     * the hunks in the patch file are out of order. */
> > +    /* TODO: Warn, create reject file? */
> > +    return SVN_NO_ERROR;
>
> I'll repeat a comment from a previous review: please use ### for to-do markers.

If that helps you, I can add them.

> Also: if there is more than one line after an "if", then please wrap
> it in braces (yes, even tho it is a comment...)

If that makes you happy I can do it.

I might add that the above two points are not mentioned in HACKING
so I'll consider them your personal preference and not the top-most
priority tasks here -- you found bigger much problems I'd like to
address first.

> >...
> > +  SVN_ERR(svn_stream_close(target->result));
> > +  SVN_ERR(svn_io_file_close(target->file, pool));
>
> Do you ever seek on target->file? If not, then let the stream own the
> file (and don't even bother to remember the apr_file_t).

We might seek in it later. I'm not sure yet.

Thanks,
Stefan
Received on 2009-04-17 23:40:21 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.