On Sun, Sep 12, 2010 at 1:49 PM, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
>> When I was working on my changes, I was looking for a "to UTF-8"
>> function that would return whether it actually re-encoded the input
>> string, but did not find one. The re-encoding function that I used,
>> `svn_subst_translate_string`, actually converts line endings to LF as
>> well as re-encodes from the given encoding to UTF-8, but it does not
>> inform the caller of whether it took either action. I guess that I
>> could write a utility function, kind of like a `strcmp`, but which
>> ignores any differences at line endings. Unfortunately, this adds
>> another scan through every property value that is encountered. Already
>> there is a noticeable decrease in the performance of the modified
>> `svnsync` as a result of calling `svn_subst_translate_string` on
>> basically every property value, and adding an additional scan through
>> each property value would decrease performance further.
>>
>
> Or you could insert the reencoding magic after (and separately from) the
> dos2unix magic, if that would make counting easier. That said, what are
> you trying to count? The number of properties where the reencoding
> wasn't a noop?
To re-encode and then normalize the line endings would work.
Unfortunately, I didn't see a library function that only performed the
re-encoding; `svn_subst_translate_string` does both simultaneously.
I removed the normalization counting code without much thought in my
hastened efforts to produce a version of `svnsync` that I could use to
mirror the GNU Nano repository. Currently, I am thinking that Stefan
Sperling's idea of a `svn_subst_translate_string2` function is the way
to go.
> Re performance, isn't svnsync bound by network speed?
Mostly yes. However, I have definitely noticed a decrease in
performance with my altered version (when using --source-encoding)
that cannot be explained by network speed. Granted, it's not that much
of a difference.
> Unrelatedly, you mentioned that in the repository you work on there are
> soem properties in latin1 and some in utf8. So one will need (until
> they fix the properties on their side) to svnsync a few revisions with
> translation enabled, then kill svnsync and restart with translation
> disabled, then restart again with it enabled etc. Which makes me think,
> do we want a "sync up to N revisions" (or, "sync up to rN") switch?
It's like you are reading my mind :) I figured that I would work on
getting this change implemented and then work on such a feature.
> Have you sent a new version of the patch yet?
Oh, not yet. I'm still working on it.
Received on 2010-09-13 01:21:44 CEST