[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn di -x-p: Valid UTF-8 sequence in function name is incorrectly divided

From: Arfrever Frehtes Taifersar Arahesis <Arfrever.FTA_at_GMail.Com>
Date: Sat, 24 Jan 2009 05:48:28 +0100

2009-01-18 20:07:41 Arfrever Frehtes Taifersar Arahesis napisaƂ(a):
> `svn di -x-p` sometimes incorrectly divides UTF-8 characters in function names:
>
> $ ./subversion-invalid-UTF-8-sequence.sh
> + rm -fr repo wc
> + svnadmin create repo
> ++ pwd
> + svn co file:///home/Arfrever/repo wc
> Checked out revision 0.
> + cd wc
> + cat
> + svn add file.c
> A file.c
> + svn ci -m ''
> Adding file.c
> Transmitting file data .
> Committed revision 1.
> + sed -i -e 's/d/d = 1/' file.c
> + svn di -x-p
> Index: file.c
> ===================================================================
> --- file.c (revision 1)
> +++ file.c (working copy)
> @@ -3,5 +3,5subversion/libsvn_subr/utf.c:597: (apr_err=22)
> svn: Valid UTF-8 data
> (hex: c5 bc c5 bc c5 bc c5 bc c5 bc c5 bc c5 bc c5 bc c5 bc c5 bc c5 bc c5 bc)
> followed by invalid UTF-8 sequence
> (hex: c5 0a)
> + set +x

I'm attaching the patch which fixes this problem.
http://en.wikipedia.org/wiki/UTF-8#Description contains some helpful informations.

[[[
Properly divide function names when using `svn diff -x -p`.

* subversion/libsvn_diff/diff_file.c
  (delete_final_partial_character): New.
  (output_unified_diff_modified): Use delete_final_partial_character().
]]]
Index: subversion/libsvn_diff/diff_file.c
===================================================================
--- subversion/libsvn_diff/diff_file.c (revision 35441)
+++ subversion/libsvn_diff/diff_file.c (working copy)
@@ -1018,6 +1018,49 @@ output_unified_flush_hunk(svn_diff__file_output_ba
   return SVN_NO_ERROR;
 }
 
+/* Return line without bytes of final, partial character of INPUT_LINE.
+ If final character isn't partial, then full INPUT_LINE is returned. */
+static char *
+delete_final_partial_character(const char *input_line)
+{
+ unsigned char *line = calloc(SVN_DIFF__EXTRA_CONTEXT_LENGTH, sizeof(char));
+ strncpy((char *) line, input_line, SVN_DIFF__EXTRA_CONTEXT_LENGTH);
+ int end = strlen((char *) line) - 1;
+
+ /* First byte of multibyte character */
+ if ((line[end] >> 6) == 3)
+ {
+ line[end] = '\0';
+ }
+ /* Non-first byte of multibyte character */
+ else if ((line[end] >> 6) == 2)
+ {
+ int start;
+ for (start = end - 1; start; start--)
+ {
+ /* Non-first byte of multibyte character */
+ if ((line[start] >> 6) == 2)
+ {
+ continue;
+ }
+ if (! /* First byte of valid 2-byte character */
+ (((line[start] >> 5) == 6) && ((end - start + 1) == 2))
+ /* First byte of valid 3-byte character */
+ || (((line[start] >> 4) == 14) && ((end - start + 1) == 3))
+ /* First byte of valid 4-byte character */
+ || (((line[start] >> 3) == 30) && ((end - start + 1) == 4)))
+ {
+ for (; start == end; start++)
+ {
+ line[start] = '\0';
+ }
+ break;
+ }
+ }
+ }
+ return (char *) line;
+}
+
 static svn_error_t *
 output_unified_diff_modified(void *baton,
   apr_off_t original_start, apr_off_t original_length,
@@ -1074,6 +1117,11 @@ output_unified_diff_modified(void *baton,
             {
               output_baton->hunk_extra_context[--p] = '\0';
             }
+
+ char *extra_context = delete_final_partial_character(output_baton->hunk_extra_context);
+ strncpy(output_baton->hunk_extra_context,
+ extra_context,
+ SVN_DIFF__EXTRA_CONTEXT_LENGTH);
         }
     }

-- 
Arfrever Frehtes Taifersar Arahesis

Received on 2009-01-24 05:53:00 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.