Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8

From: Daniel Trebbien <dtrebbien_at_gmail.com>
Date: Sun, 12 Sep 2010 16:21:03 -0700

On Sun, Sep 12, 2010 at 1:49 PM, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
>> When I was working on my changes, I was looking for a "to UTF-8"
>> function that would return whether it actually re-encoded the input
>> string, but did not find one. The re-encoding function that I used,
>> `svn_subst_translate_string`, actually converts line endings to LF as
>> well as re-encodes from the given encoding to UTF-8, but it does not
>> inform the caller of whether it took either action. I guess that I
>> could write a utility function, kind of like a `strcmp`, but which
>> ignores any differences at line endings. Unfortunately, this adds
>> another scan through every property value that is encountered. Already
>> there is a noticeable decrease in the performance of the modified
>> `svnsync` as a result of calling `svn_subst_translate_string` on
>> basically every property value, and adding an additional scan through
>> each property value would decrease performance further.
>>
>
> Or you could insert the reencoding magic after (and separately from) the
> dos2unix magic, if that would make counting easier. Â That said, what are
> you trying to count? Â The number of properties where the reencoding
> wasn't a noop?

To re-encode and then normalize the line endings would work.
Unfortunately, I didn't see a library function that only performed the
re-encoding; `svn_subst_translate_string` does both simultaneously.

I removed the normalization counting code without much thought in my
hastened efforts to produce a version of `svnsync` that I could use to
mirror the GNU Nano repository. Currently, I am thinking that Stefan
Sperling's idea of a `svn_subst_translate_string2` function is the way
to go.

> Re performance, isn't svnsync bound by network speed?

Mostly yes. However, I have definitely noticed a decrease in
performance with my altered version (when using --source-encoding)
that cannot be explained by network speed. Granted, it's not that much
of a difference.

> Unrelatedly, you mentioned that in the repository you work on there are
> soem properties in latin1 and some in utf8. Â So one will need (until
> they fix the properties on their side) to svnsync a few revisions with
> translation enabled, then kill svnsync and restart with translation
> disabled, then restart again with it enabled etc. Â Which makes me think,
> do we want a "sync up to N revisions" (or, "sync up to rN") switch?

It's like you are reading my mind :) I figured that I would work on
getting this change implemented and then work on such a feature.

> Have you sent a new version of the patch yet?

Oh, not yet. I'm still working on it.
Received on 2010-09-13 01:21:44 CEST

This message: [ Message body ]
Next message: Daniel Shahaf: "Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8"
Previous message: Daniel Shahaf: "Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8"
In reply to: Daniel Shahaf: "Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8"
Next in thread: Daniel Shahaf: "Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8"
Reply: Daniel Shahaf: "Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8"
Reply: Stefan Sperling: "Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]