[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

From: Daniel Trebbien <dtrebbien_at_gmail.com>
Date: Fri, 10 Sep 2010 16:57:41 -0700

>> * subversion/svnsync/main.c
>>   (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
>> of acceptable options for the init, sync, and copy-revprops
>> subcommands.
>
> Missing indentation on the non-first lines.  Should be in one of these forms:
>
> [[[
> * subversion/svnsync/main.c
>  (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
>     of acceptable options for the init, sync, and copy-revprops subcommands.
> ]]]
>
> [[[
> * subversion/svnsync/main.c
>  (svnsync_cmd_table):
>     Added svnsync_opt_source_encoding to the list of acceptable options for
>     the init, sync, and copy-revprops subcommands.
> ]]]

Gmail added in the line breaks, but they aren't in my log message text
file. Should I wrap the log message to a certain number of characters?

>>   (log_properties_normalized): Removed this obsolete function.
>
> In other words, you dropped the entire "count changes" feature, without
> any discussion.  (It's good that the log message is clear about this
> change, though.)
>
> Please don't do that.  Someone spent some time writing and debugging
> this feature, you can't just come in and drop their code.  If you really
> think this is redundant, then start a discussion and get consensus to
> drop this functionality.
>
> And at any rate, your "recode log messages" work needn't depend on it.
> It can (IMO: should) even introduce counters for how many properties it
> recoded.

When I was working on my changes, I was looking for a "to UTF-8"
function that would return whether it actually re-encoded the input
string, but did not find one. The re-encoding function that I used,
`svn_subst_translate_string`, actually converts line endings to LF as
well as re-encodes from the given encoding to UTF-8, but it does not
inform the caller of whether it took either action. I guess that I
could write a utility function, kind of like a `strcmp`, but which
ignores any differences at line endings. Unfortunately, this adds
another scan through every property value that is encountered. Already
there is a noticeable decrease in the performance of the modified
`svnsync` as a result of calling `svn_subst_translate_string` on
basically every property value, and adding an additional scan through
each property value would decrease performance further.

I for one do not think that the final "NOTE: Normalized ## properties
to LF line endings" messages are particularly helpful because they do
not say *which* properties of specific revisions were normalized. As a
possible compromise, I could modify a Perl script that I wrote for the
purpose of identifying all, non-ASCII property values from the GNU
Nano repository to list revisions and property values that *would*
require normalization and/or re-encoding. Just a thought...

>> * subversion/svnsync/sync.c
>>   Standardized terminology by replacing "normalize" with "translate".
>
> Please don't do this.  It makes it harder to read the diff (it obscures
> the functional changes), it's a bikeshed (http://12a2a2.bikeshed.com),
> and in my opinion "normalize" is the more accurate term for what you're
> doing now after the patch.  :-)
>
> Just stick to the existing terminology please.

Okay. I'll change it back.

>> @@ -785,7 +772,7 @@
>>  {
>>    const char *to_url, *from_url;
>>    svn_ra_session_t *to_session;
>> -  opt_baton_t *opt_baton = b;
>> +  opt_baton_t *opt_baton = (opt_baton_t*)b;
>
> Unnecessary change.  As a rule, patches shouldn't have them.

Reverted.

>> @@ -1625,9 +1589,9 @@
>>  /* SUBCOMMAND: help */
>>  static svn_error_t *
>> -help_cmd(apr_getopt_t *os, void *baton, apr_pool_t *pool)
>> +help_cmd(apr_getopt_t *os, void *b, apr_pool_t *pool)
>>  {
>
> Why did you rename the formal parameter?  Isn't it another unnecessary
> change (as I explained above) that shouldn't be randomly included in
> a patch?

I renamed it because all of the other functions that take a void*
parameter for passing the opt_baton_t* call it `b`. It doesn't matter,
though. I'll change it back.

>> @@ -1982,6 +1951,8 @@
>>
>>    config = apr_hash_get(opt_baton.config, SVN_CONFIG_CATEGORY_CONFIG,
>>                          APR_HASH_KEY_STRING);
>> +
>> +  opt_baton.source_encoding = source_encoding;
>>
>
> It seems that most other options are parsed into opt_baton.foo directly,
> skipping a local variable.

Okay. I'll change that.

>> Index: subversion/svnsync/sync.c
>> ===================================================================
>> --- subversion/svnsync/sync.c (revision 995839)
>> +++ subversion/svnsync/sync.c (working copy)
>> @@ -45,29 +45,39 @@
>> -/* Normalize the line ending style of *STR, so that it contains only
>> - * LF (\n) line endings. After return, *STR may point at a new
>> - * svn_string_t* allocated from POOL.
>> +/* Translate the encoding and line ending style of *STR, so that it contains
>> + * only LF (\n) line endings and is encoded in UTF-8. After return, *STR may
>> + * point at a new svn_string_t* allocated from POOL if *WAS_TRANSLATED. If
>> + * ENCODING is not NULL, then *STR is presumed to be encoded in UTF-8.
>>   *
>> - * *WAS_NORMALIZED is set to TRUE when *STR needed to be normalized,
>> + * *WAS_TRANSLATED is set to TRUE when *STR needed to be translated,
>
> Do you want to maintain separate counters for "encoding had to be
> changed" and "linefeeds had to be fixed", or a combined counter?
>
> (As I said: if you think counters should be removed completely, make
> your case in a separate thread.  That's orthogonal to the encoding work.)

Would dev@ or users@ be better?

>> @@ -501,13 +507,11 @@
>> +      SVN_ERR(translate_string(&value, &was_translated, eb->prop_encoding, pool));
>
> Please wrap to 80 characters.

Fixed
Received on 2010-09-11 01:58:37 CEST

This is an archived mail posted to the Subversion Dev mailing list.