Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

From: Daniel Trebbien <dtrebbien_at_gmail.com>
Date: Fri, 10 Sep 2010 16:57:41 -0700

>> * subversion/svnsync/main.c
>> Â (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
>> of acceptable options for the init, sync, and copy-revprops
>> subcommands.
>
> Missing indentation on the non-first lines. Â Should be in one of these forms:
>
> [[[
> * subversion/svnsync/main.c
> Â (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
> Â Â of acceptable options for the init, sync, and copy-revprops subcommands.
> ]]]
>
> [[[
> * subversion/svnsync/main.c
> Â (svnsync_cmd_table):
> Â Â Added svnsync_opt_source_encoding to the list of acceptable options for
> Â Â the init, sync, and copy-revprops subcommands.
> ]]]

Gmail added in the line breaks, but they aren't in my log message text
file. Should I wrap the log message to a certain number of characters?

>> Â (log_properties_normalized): Removed this obsolete function.
>
> In other words, you dropped the entire "count changes" feature, without
> any discussion. Â (It's good that the log message is clear about this
> change, though.)
>
> Please don't do that. Â Someone spent some time writing and debugging
> this feature, you can't just come in and drop their code. Â If you really
> think this is redundant, then start a discussion and get consensus to
> drop this functionality.
>
> And at any rate, your "recode log messages" work needn't depend on it.
> It can (IMO: should) even introduce counters for how many properties it
> recoded.

When I was working on my changes, I was looking for a "to UTF-8"
function that would return whether it actually re-encoded the input
string, but did not find one. The re-encoding function that I used,
`svn_subst_translate_string`, actually converts line endings to LF as
well as re-encodes from the given encoding to UTF-8, but it does not
inform the caller of whether it took either action. I guess that I
could write a utility function, kind of like a `strcmp`, but which
ignores any differences at line endings. Unfortunately, this adds
another scan through every property value that is encountered. Already
there is a noticeable decrease in the performance of the modified
`svnsync` as a result of calling `svn_subst_translate_string` on
basically every property value, and adding an additional scan through
each property value would decrease performance further.

I for one do not think that the final "NOTE: Normalized ## properties
to LF line endings" messages are particularly helpful because they do
not say *which* properties of specific revisions were normalized. As a
possible compromise, I could modify a Perl script that I wrote for the
purpose of identifying all, non-ASCII property values from the GNU
Nano repository to list revisions and property values that *would*
require normalization and/or re-encoding. Just a thought...

>> * subversion/svnsync/sync.c
>> Â Standardized terminology by replacing "normalize" with "translate".
>
> Please don't do this. Â It makes it harder to read the diff (it obscures
> the functional changes), it's a bikeshed (http://12a2a2.bikeshed.com),
> and in my opinion "normalize" is the more accurate term for what you're
> doing now after the patch. Â :-)
>
> Just stick to the existing terminology please.

Okay. I'll change it back.

>> @@ -785,7 +772,7 @@
>> Â {
>> Â Â const char *to_url, *from_url;
>> Â Â svn_ra_session_t *to_session;
>> - Â opt_baton_t *opt_baton = b;
>> + Â opt_baton_t *opt_baton = (opt_baton_t*)b;
>
> Unnecessary change. Â As a rule, patches shouldn't have them.

Reverted.

>> @@ -1625,9 +1589,9 @@
>> Â /* SUBCOMMAND: help */
>> Â static svn_error_t *
>> -help_cmd(apr_getopt_t *os, void *baton, apr_pool_t *pool)
>> +help_cmd(apr_getopt_t *os, void *b, apr_pool_t *pool)
>> Â {
>
> Why did you rename the formal parameter? Â Isn't it another unnecessary
> change (as I explained above) that shouldn't be randomly included in
> a patch?

I renamed it because all of the other functions that take a void*
parameter for passing the opt_baton_t* call it `b`. It doesn't matter,
though. I'll change it back.

>> @@ -1982,6 +1951,8 @@
>>
>> Â Â config = apr_hash_get(opt_baton.config, SVN_CONFIG_CATEGORY_CONFIG,
>> Â Â Â Â Â Â Â Â Â Â Â Â Â APR_HASH_KEY_STRING);
>> +
>> + Â opt_baton.source_encoding = source_encoding;
>>
>
> It seems that most other options are parsed into opt_baton.foo directly,
> skipping a local variable.

Okay. I'll change that.

>> Index: subversion/svnsync/sync.c
>> ===================================================================
>> --- subversion/svnsync/sync.c (revision 995839)
>> +++ subversion/svnsync/sync.c (working copy)
>> @@ -45,29 +45,39 @@
>> -/* Normalize the line ending style of *STR, so that it contains only
>> - * LF (\n) line endings. After return, *STR may point at a new
>> - * svn_string_t* allocated from POOL.
>> +/* Translate the encoding and line ending style of *STR, so that it contains
>> + * only LF (\n) line endings and is encoded in UTF-8. After return, *STR may
>> + * point at a new svn_string_t* allocated from POOL if *WAS_TRANSLATED. If
>> + * ENCODING is not NULL, then *STR is presumed to be encoded in UTF-8.
>> Â *
>> - * *WAS_NORMALIZED is set to TRUE when *STR needed to be normalized,
>> + * *WAS_TRANSLATED is set to TRUE when *STR needed to be translated,
>
> Do you want to maintain separate counters for "encoding had to be
> changed" and "linefeeds had to be fixed", or a combined counter?
>
> (As I said: if you think counters should be removed completely, make
> your case in a separate thread. Â That's orthogonal to the encoding work.)

Would dev@ or users@ be better?

>> @@ -501,13 +507,11 @@
>> + Â Â Â SVN_ERR(translate_string(&value, &was_translated, eb->prop_encoding, pool));
>
> Please wrap to 80 characters.

Fixed
Received on 2010-09-11 01:58:37 CEST

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]