[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Thu, 9 Sep 2010 13:48:46 +0300

CC += dev@, and let me point you to our patch submission guidelines:
http://subversion.apache.org/docs/community-guide/general.html#patches

Summary for dev@: allow svnsync to translate non-UTF-8 log messages to UTF-8.

(more below)

Daniel Trebbien wrote on Wed, Sep 08, 2010 at 18:58:06 -0700:
> On Wed, Sep 8, 2010 at 4:39 PM, Daniel Trebbien <dtrebbien_at_gmail.com> wrote:
> > I think that a call to `svn_subst_translate_string`
> > (http://svn.collab.net/svn-doxygen/svn__subst_8h.html#a29) needs to be
> > added in the `svnsync_normalize_revprops` function when `propname` is
> > "svn:log".
>
> After applying the following patch to `subversion/svnsync/sync.c`, I
> can confirm that the "svnsync: Error while replaying commit" error
> disappears, allowing the sync to complete, and that the problem is
> indeed a log message encoding issue:
> diff --git a/subversion/svnsync/sync.c b/subversion/svnsync/sync.c
> index 8f7be9e..c7a5e38 100644
> --- a/subversion/svnsync/sync.c
> +++ b/subversion/svnsync/sync.c
> @@ -114,6 +114,14 @@ svnsync_normalize_revprops(apr_hash_t *rev_props,
> /* And count this. */
> (*normalized_count)++;
> }
> +
> + if (strcmp(propname, "svn:log") == 0)
> + {

Should this use svn_prop_needs_translation()?

> + svn_string_t *new_propval;
> + SVN_ERR(svn_subst_translate_string(&new_propval, propval, "ISO-8859-1", pool));
> + apr_hash_set(rev_props, propname, APR_HASH_KEY_STRING, propval = new_propval);

Please, please, please, no assignment inside a function call. ^^^^^^^^^

> + (*normalized_count)++;
> + }
> }
> }
> return SVN_NO_ERROR;
>
>
> Here I hard-coded the encoding, but I think that the encoding to use
> should be retrieved from the Subversion config file. Now the questions
> are:
>
> 1. What is the best way to pass the `log-encoding` option of the
> [miscellany] section to the `svnsync_normalize_revprops` function?
>

What is 'log-encoding' normally used for? Only for recoding the commit
message supplied by an editor, right? So I'm not sure we should use it
here, perhaps a new command-line option will be better. (svnsync
$subcommand --source-encoding=%s)

If you'd like to use a config option, get it from 'svn_config_t *config'
in main(), stash it in the subcommand baton, and drill it down as deep
as needed.

And either way, the default behaviour should be to translate nothing.
(If you always translate from latin1, you break syncs for people who,
correctly, used UTF-8 log messages the entire time.)

> 2. Should `svnsync` always store the log messages in UTF-8 even though
> the original repository log message encoding is different?
>

You have no choice on the matter: the RA API instructs you to supply it
a UTF-8 log message. (IIRC, even the libsvn_client API expects an
already-UTF-8 message.)

> 3. Should there be separate configuration options for the log message
> encoding of the source repository and the log message encoding of the
> destination repository?

No, see (2).
Received on 2010-09-09 12:52:56 CEST

This is an archived mail posted to the Subversion Dev mailing list.