On Thu, Sep 9, 2010 at 3:48 AM, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> CC += dev@, and let me point you to our patch submission guidelines:
> http://subversion.apache.org/docs/community-guide/general.html#patches
>
> Summary for dev@: allow svnsync to translate non-UTF-8 log messages to UTF-8.
>
> (more below)
>
> Daniel Trebbien wrote on Wed, Sep 08, 2010 at 18:58:06 -0700:
>> On Wed, Sep 8, 2010 at 4:39 PM, Daniel Trebbien <dtrebbien_at_gmail.com> wrote:
>> > I think that a call to `svn_subst_translate_string`
>> > (http://svn.collab.net/svn-doxygen/svn__subst_8h.html#a29) needs to be
>> > added in the `svnsync_normalize_revprops` function when `propname` is
>> > "svn:log".
>>
>> After applying the following patch to `subversion/svnsync/sync.c`, I
>> can confirm that the "svnsync: Error while replaying commit" error
>> disappears, allowing the sync to complete, and that the problem is
>> indeed a log message encoding issue:
>> diff --git a/subversion/svnsync/sync.c b/subversion/svnsync/sync.c
>> index 8f7be9e..c7a5e38 100644
>> --- a/subversion/svnsync/sync.c
>> +++ b/subversion/svnsync/sync.c
>> @@ -114,6 +114,14 @@ svnsync_normalize_revprops(apr_hash_t *rev_props,
>> /* And count this. */
>> (*normalized_count)++;
>> }
>> +
>> + if (strcmp(propname, "svn:log") == 0)
>> + {
>
> Should this use svn_prop_needs_translation()?
Though it does not appear so, the added lines are within the check for
`svn_prop_needs_translation`.
>> + svn_string_t *new_propval;
>> + SVN_ERR(svn_subst_translate_string(&new_propval, propval, "ISO-8859-1", pool));
>> + apr_hash_set(rev_props, propname, APR_HASH_KEY_STRING, propval = new_propval);
>
> Please, please, please, no assignment inside a function call. ^^^^^^^^^
Fixed
>> + (*normalized_count)++;
>> + }
>> }
>> }
>> return SVN_NO_ERROR;
>>
>>
>> Here I hard-coded the encoding, but I think that the encoding to use
>> should be retrieved from the Subversion config file. Now the questions
>> are:
>>
>> 1. What is the best way to pass the `log-encoding` option of the
>> [miscellany] section to the `svnsync_normalize_revprops` function?
>>
>
> What is 'log-encoding' normally used for? Only for recoding the commit
> message supplied by an editor, right? So I'm not sure we should use it
> here, perhaps a new command-line option will be better. (svnsync
> $subcommand --source-encoding=%s)
I like the idea of adding a command line option, so I think that this
is the way to go.
Do other properties need to be re-encoded, potentially? I was only
thinking about `svn:log`, but perhaps other properties might need
re-encoding as well. If not, then I think that the command line option
should be more specific (e.g.: `--source-log-encoding=%s`).
> And either way, the default behaviour should be to translate nothing.
> (If you always translate from latin1, you break syncs for people who,
> correctly, used UTF-8 log messages the entire time.)
I agree. The default behavior should definitely be to not re-encode
the log messages.
>> 2. Should `svnsync` always store the log messages in UTF-8 even though
>> the original repository log message encoding is different?
>>
>
> You have no choice on the matter: the RA API instructs you to supply it
> a UTF-8 log message. (IIRC, even the libsvn_client API expects an
> already-UTF-8 message.)
>
>> 3. Should there be separate configuration options for the log message
>> encoding of the source repository and the log message encoding of the
>> destination repository?
>
> No, see (2).
>
Another question: which line of code raises the "svnsync: Error while
replaying commit" error message? I would like to try to make this more
helpful.
Received on 2010-09-09 17:49:23 CEST