[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Sat, 11 Sep 2010 01:30:53 +0300

Daniel Trebbien wrote on Fri, Sep 10, 2010 at 11:12:20 -0700:
> On Fri, Sep 10, 2010 at 9:16 AM, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> > The patch didn't make it to the list (only to my personal copy), could
> > you please re-send with the patch in "text/plain" MIME type?
>
> Hmmm. I wonder if adding a .txt extension will work...

> [[[
> Add a command line option to the svnsync init, sync, and copy-revprops
> subcommands (--source-encoding) that allows the user to specify the
> character encoding of translatable properties from the source
> repository. This is needed to allow svnsync to import some older

s/import/sync/

> Subversion repositories that have properties that were not encoded in
> UTF-8.
>
> As discussed at:
> http://thread.gmane.org/gmane.comp.version-control.subversion.user/100020
>

Nice, I see in this link even the posts that were CCed only to dev@. gmane++

> * subversion/svnsync/main.c
> (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
> of acceptable options for the init, sync, and copy-revprops
> subcommands.

Missing indentation on the non-first lines. Should be in one of these forms:

[[[
* subversion/svnsync/main.c
  (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
     of acceptable options for the init, sync, and copy-revprops subcommands.
]]]

[[[
* subversion/svnsync/main.c
  (svnsync_cmd_table):
     Added svnsync_opt_source_encoding to the list of acceptable options for
     the init, sync, and copy-revprops subcommands.
]]]

> (log_properties_normalized): Removed this obsolete function.

In other words, you dropped the entire "count changes" feature, without
any discussion. (It's good that the log message is clear about this
change, though.)

Please don't do that. Someone spent some time writing and debugging
this feature, you can't just come in and drop their code. If you really
think this is redundant, then start a discussion and get consensus to
drop this functionality.

And at any rate, your "recode log messages" work needn't depend on it.
It can (IMO: should) even introduce counters for how many properties it
recoded.

> * subversion/svnsync/sync.c
> Standardized terminology by replacing "normalize" with "translate".

Please don't do this. It makes it harder to read the diff (it obscures
the functional changes), it's a bikeshed (http://12a2a2.bikeshed.com),
and in my opinion "normalize" is the more accurate term for what you're
doing now after the patch. :-)

Just stick to the existing terminology please.

> * subversion/svnsync/sync.h

Overall, log message looks good, wrt amount of detail, etc. (But
I haven't compared it point-by-point with the actual patch.)

> ]]]
>

> +++ subversion/svnsync/main.c (working copy)
> @@ -104,8 +105,8 @@
> "the destination repository by any method other than 'svnsync'.\n"
> "In other words, the destination repository should be a read-only\n"
> "mirror of the source repository.\n"),
> - { SVNSYNC_OPTS_DEFAULT, 'q', svnsync_opt_allow_non_empty,
> - svnsync_opt_disable_locking } },
> + { SVNSYNC_OPTS_DEFAULT, svnsync_opt_source_encoding, 'q',
> + svnsync_opt_allow_non_empty, svnsync_opt_disable_locking } },

Apparently our convention is to append new options to the end of the
array. But that's not a major concern.

> @@ -200,6 +203,12 @@
> + {"source-encoding", svnsync_opt_source_encoding, 1,
> + N_("convert translatable properties from encoding ARG\n"
> + "to UTF-8. If not specified, then properties are\n"
> + "presumed to be encoded in UTF-8.")},

"Translatable properties" may be confusing to an end user, but I don't
have a better suggestion.

> @@ -227,6 +236,7 @@
> const char *sync_password;
> const char *config_dir;
> apr_hash_t *config;
> + const char *source_encoding;
> svn_boolean_t disable_locking;
> svn_boolean_t quiet;
> svn_boolean_t allow_non_empty;
> @@ -352,6 +362,9 @@
> svn_boolean_t quiet;
> svn_boolean_t allow_non_empty;
> const char *to_url;
> +
> + /* initialize, synchronize, and copy-revprops only */
> + const char *source_encoding;
>
> /* initialize only */
> const char *from_url;

That's just my preference now, but could you use 'svn diff -x-p' so that
function/struct names show in the hunk headers?

@dev: how about making the patch submission guidelines recommend '-x-p'
be used regularly in patch submissions?

> @@ -785,7 +772,7 @@
> {
> const char *to_url, *from_url;
> svn_ra_session_t *to_session;
> - opt_baton_t *opt_baton = b;
> + opt_baton_t *opt_baton = (opt_baton_t*)b;

Unnecessary change. As a rule, patches shouldn't have them.

(the same cast is added again later in the diff)

> @@ -1625,9 +1589,9 @@
> /* SUBCOMMAND: help */
> static svn_error_t *
> -help_cmd(apr_getopt_t *os, void *baton, apr_pool_t *pool)
> +help_cmd(apr_getopt_t *os, void *b, apr_pool_t *pool)
> {

Why did you rename the formal parameter? Isn't it another unnecessary
change (as I explained above) that shouldn't be randomly included in
a patch?

> @@ -1982,6 +1951,8 @@
>
> config = apr_hash_get(opt_baton.config, SVN_CONFIG_CATEGORY_CONFIG,
> APR_HASH_KEY_STRING);
> +
> + opt_baton.source_encoding = source_encoding;
>

It seems that most other options are parsed into opt_baton.foo directly,
skipping a local variable.

> Index: subversion/svnsync/sync.c
> ===================================================================
> --- subversion/svnsync/sync.c (revision 995839)
> +++ subversion/svnsync/sync.c (working copy)
> @@ -45,29 +45,39 @@
> -/* Normalize the line ending style of *STR, so that it contains only
> - * LF (\n) line endings. After return, *STR may point at a new
> - * svn_string_t* allocated from POOL.
> +/* Translate the encoding and line ending style of *STR, so that it contains
> + * only LF (\n) line endings and is encoded in UTF-8. After return, *STR may
> + * point at a new svn_string_t* allocated from POOL if *WAS_TRANSLATED. If
> + * ENCODING is not NULL, then *STR is presumed to be encoded in UTF-8.
> *
> - * *WAS_NORMALIZED is set to TRUE when *STR needed to be normalized,
> + * *WAS_TRANSLATED is set to TRUE when *STR needed to be translated,

Do you want to maintain separate counters for "encoding had to be
changed" and "linefeeds had to be fixed", or a combined counter?

(As I said: if you think counters should be removed completely, make
your case in a separate thread. That's orthogonal to the encoding work.)

> @@ -501,13 +507,11 @@
> + SVN_ERR(translate_string(&value, &was_translated, eb->prop_encoding, pool));

Please wrap to 80 characters.

>

Bottom line, could you fix these issues (in particular the "dropped
counters" and the "changed terminology" issues) and re-send the patch?

Thanks,

Daniel
Received on 2010-09-11 00:35:11 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.