[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

From: Daniel Trebbien <dtrebbien_at_gmail.com>
Date: Sun, 12 Sep 2010 16:21:03 -0700

On Sun, Sep 12, 2010 at 1:49 PM, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
>> When I was working on my changes, I was looking for a "to UTF-8"
>> function that would return whether it actually re-encoded the input
>> string, but did not find one. The re-encoding function that I used,
>> `svn_subst_translate_string`, actually converts line endings to LF as
>> well as re-encodes from the given encoding to UTF-8, but it does not
>> inform the caller of whether it took either action. I guess that I
>> could write a utility function, kind of like a `strcmp`, but which
>> ignores any differences at line endings. Unfortunately, this adds
>> another scan through every property value that is encountered. Already
>> there is a noticeable decrease in the performance of the modified
>> `svnsync` as a result of calling `svn_subst_translate_string` on
>> basically every property value, and adding an additional scan through
>> each property value would decrease performance further.
>>
>
> Or you could insert the reencoding magic after (and separately from) the
> dos2unix magic, if that would make counting easier.  That said, what are
> you trying to count?  The number of properties where the reencoding
> wasn't a noop?

To re-encode and then normalize the line endings would work.
Unfortunately, I didn't see a library function that only performed the
re-encoding; `svn_subst_translate_string` does both simultaneously.

I removed the normalization counting code without much thought in my
hastened efforts to produce a version of `svnsync` that I could use to
mirror the GNU Nano repository. Currently, I am thinking that Stefan
Sperling's idea of a `svn_subst_translate_string2` function is the way
to go.

> Re performance, isn't svnsync bound by network speed?

Mostly yes. However, I have definitely noticed a decrease in
performance with my altered version (when using --source-encoding)
that cannot be explained by network speed. Granted, it's not that much
of a difference.

> Unrelatedly, you mentioned that in the repository you work on there are
> soem properties in latin1 and some in utf8.  So one will need (until
> they fix the properties on their side) to svnsync a few revisions with
> translation enabled, then kill svnsync and restart with translation
> disabled, then restart again with it enabled etc.  Which makes me think,
> do we want a "sync up to N revisions" (or, "sync up to rN") switch?

It's like you are reading my mind :) I figured that I would work on
getting this change implemented and then work on such a feature.

> Have you sent a new version of the patch yet?

Oh, not yet. I'm still working on it.
Received on 2010-09-13 01:21:44 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.