[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r35446 - trunk/subversion/libsvn_diff

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Mon, 26 Jan 2009 01:56:17 +0200 (Jerusalem Standard Time)

Arfrever Frehtes Taifersar Arahesis wrote on Sun, 25 Jan 2009 at 16:29 +0100:
> 2009-01-25 07:23:35 Daniel Shahaf napisaƂ(a):
> > > @@ -1074,6 +1075,14 @@ output_unified_diff_modified(void *baton
> > > {
> > > output_baton->hunk_extra_context[--p] = '\0';
> > > }
> > > + const char *invalid_character =
> > > + svn_utf__last_valid(output_baton->hunk_extra_context,
> > > + SVN_DIFF__EXTRA_CONTEXT_LENGTH - 1);
> >
> > s/SVN_DIFF__EXTRA_CONTEXT_LENGTH - 1/SVN_DIFF__EXTRA_CONTEXT_LENGTH + 1/ ?
>
> 15 lines earlier there is:
> /* Save the extra context for later use.
> * Note that the last byte of the hunk_extra_context array is never
> * touched after it is zero-initialized, so the array is always
> * 0-terminated. */
> strncpy(output_baton->hunk_extra_context,
> output_baton->extra_context->data,
> SVN_DIFF__EXTRA_CONTEXT_LENGTH);
>

Meaning that the first SVN_DIFF__EXTRA_CONTEXT_LENGTH bytes of HUNK_EXTRA_CONTEXT
are non-NUL at this point. (The (SVN_DIFF__EXTRA_CONTEXT_LENGTH+1)th bit is zero,
as the comment says.)

> svn_utf__last_valid()'s doc string contains:
> > /* Return a pointer to the first character after the last valid UTF-8
> > * multi-byte character in the string SRC of length LEN. If SRC is a valid
> > * UTF-8 the return value will point to the byte after SRC+LEN, otherwise
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Okay. You pass:

> > > + svn_utf__last_valid(output_baton->hunk_extra_context,
> > > + SVN_DIFF__EXTRA_CONTEXT_LENGTH - 1);

so, for you, the "byte after SRC+LEN" is

    hunk_extra_context[SVN_DIFF__EXTRA_CONTEXT_LENGTH - 1]

meaning that only the characters

    hunk_extra_context[0]
    through
    hunk_extra_context[SVN_DIFF__EXTRA_CONTEXT_LENGTH - 2]

will be checked for UTF-8 validity. So you are checking two characters less
than the actual length of the string. (Again, it's declared as
'hunk_extra_context[SVN_DIFF__EXTRA_CONTEXT_LENGTH + 1]'.) Meaning, that IMO
                                                  ^^^^
you should increment the argument by 2, as I said originally. :-)

Makes sense?

> > * it will point to the start of the first invalid multi-byte character.
> > * In either case all the characters between SRC and the return pointer are
> > * valid UTF-8.
>
> So IMHO current code is correct.
>
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1052364
Received on 2009-01-26 09:31:00 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.