Arfrever Frehtes Taifersar Arahesis wrote on Sun, 25 Jan 2009 at 16:29 +0100:
> 2009-01-25 07:23:35 Daniel Shahaf napisaĆ(a):
> > > @@ -1074,6 +1075,14 @@ output_unified_diff_modified(void *baton
> > > {
> > > output_baton->hunk_extra_context[--p] = '\0';
> > > }
> > > + const char *invalid_character =
> > > + svn_utf__last_valid(output_baton->hunk_extra_context,
> > > + SVN_DIFF__EXTRA_CONTEXT_LENGTH - 1);
> >
> > s/SVN_DIFF__EXTRA_CONTEXT_LENGTH - 1/SVN_DIFF__EXTRA_CONTEXT_LENGTH + 1/ ?
>
> 15 lines earlier there is:
> /* Save the extra context for later use.
> * Note that the last byte of the hunk_extra_context array is never
> * touched after it is zero-initialized, so the array is always
> * 0-terminated. */
> strncpy(output_baton->hunk_extra_context,
> output_baton->extra_context->data,
> SVN_DIFF__EXTRA_CONTEXT_LENGTH);
>
Meaning that the first SVN_DIFF__EXTRA_CONTEXT_LENGTH bytes of HUNK_EXTRA_CONTEXT
are non-NUL at this point. (The (SVN_DIFF__EXTRA_CONTEXT_LENGTH+1)th bit is zero,
as the comment says.)
> svn_utf__last_valid()'s doc string contains:
> > /* Return a pointer to the first character after the last valid UTF-8
> > * multi-byte character in the string SRC of length LEN. If SRC is a valid
> > * UTF-8 the return value will point to the byte after SRC+LEN, otherwise
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Okay. You pass:
> > > + svn_utf__last_valid(output_baton->hunk_extra_context,
> > > + SVN_DIFF__EXTRA_CONTEXT_LENGTH - 1);
so, for you, the "byte after SRC+LEN" is
hunk_extra_context[SVN_DIFF__EXTRA_CONTEXT_LENGTH - 1]
meaning that only the characters
hunk_extra_context[0]
through
hunk_extra_context[SVN_DIFF__EXTRA_CONTEXT_LENGTH - 2]
will be checked for UTF-8 validity. So you are checking two characters less
than the actual length of the string. (Again, it's declared as
'hunk_extra_context[SVN_DIFF__EXTRA_CONTEXT_LENGTH + 1]'.) Meaning, that IMO
^^^^
you should increment the argument by 2, as I said originally. :-)
Makes sense?
> > * it will point to the start of the first invalid multi-byte character.
> > * In either case all the characters between SRC and the return pointer are
> > * valid UTF-8.
>
> So IMHO current code is correct.
>
>
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1052364
Received on 2009-01-26 09:31:00 CET