[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Sun, 12 Sep 2010 22:49:56 +0200

Daniel Trebbien wrote on Fri, Sep 10, 2010 at 16:57:41 -0700:
> >> * subversion/svnsync/main.c
> >>   (svnsync_cmd_table): Added svnsync_opt_source_encoding to the list
> >> of acceptable options for the init, sync, and copy-revprops
> >> subcommands.
> >
> > Missing indentation on the non-first lines.  Should be in one of these forms:
> Gmail added in the line breaks, but they aren't in my log message text
> file. Should I wrap the log message to a certain number of characters?

Yes, 79 or 80.

> >>   (log_properties_normalized): Removed this obsolete function.
> >
> > In other words, you dropped the entire "count changes" feature, without
> > any discussion.  (It's good that the log message is clear about this
> > change, though.)
> >
> > Please don't do that.  Someone spent some time writing and debugging
> > this feature, you can't just come in and drop their code.  If you really
> > think this is redundant, then start a discussion and get consensus to
> > drop this functionality.
> >
> > And at any rate, your "recode log messages" work needn't depend on it.
> > It can (IMO: should) even introduce counters for how many properties it
> > recoded.
> When I was working on my changes, I was looking for a "to UTF-8"
> function that would return whether it actually re-encoded the input
> string, but did not find one. The re-encoding function that I used,
> `svn_subst_translate_string`, actually converts line endings to LF as
> well as re-encodes from the given encoding to UTF-8, but it does not
> inform the caller of whether it took either action. I guess that I
> could write a utility function, kind of like a `strcmp`, but which
> ignores any differences at line endings. Unfortunately, this adds
> another scan through every property value that is encountered. Already
> there is a noticeable decrease in the performance of the modified
> `svnsync` as a result of calling `svn_subst_translate_string` on
> basically every property value, and adding an additional scan through
> each property value would decrease performance further.

Or you could insert the reencoding magic after (and separately from) the
dos2unix magic, if that would make counting easier. That said, what are
you trying to count? The number of properties where the reencoding
wasn't a noop?

Re performance, isn't svnsync bound by network speed?

Unrelatedly, you mentioned that in the repository you work on there are
soem properties in latin1 and some in utf8. So one will need (until
they fix the properties on their side) to svnsync a few revisions with
translation enabled, then kill svnsync and restart with translation
disabled, then restart again with it enabled etc. Which makes me think,
do we want a "sync up to N revisions" (or, "sync up to rN") switch?

> I for one do not think that the final "NOTE: Normalized ## properties
> to LF line endings" messages are particularly helpful because they do
> not say *which* properties of specific revisions were normalized. As a
> possible compromise, I could modify a Perl script that I wrote for the
> purpose of identifying all, non-ASCII property values from the GNU
> Nano repository to list revisions and property values that *would*
> require normalization and/or re-encoding. Just a thought...
> >> * subversion/svnsync/sync.c
> >>   Standardized terminology by replacing "normalize" with "translate".
> >
> > Just stick to the existing terminology please.
> Okay. I'll change it back.


> >> @@ -501,13 +507,11 @@
> >> +      SVN_ERR(translate_string(&value, &was_translated, eb->prop_encoding, pool));
> >
> > Please wrap to 80 characters.
> Fixed

Have you sent a new version of the patch yet?
Received on 2010-09-12 22:54:03 CEST

This is an archived mail posted to the Subversion Dev mailing list.