Dmitry Y. Labutin wrote:
> Hi, All.
>
> When I commit files with Russian filename I receive log message by
> e-mail looks like this:
> ------------------------------------------------------------------------
> Author: kornyakov
> Date: 2007-05-12 14:29:15 +0400 (Sat, 12 May 2007)
> New Revision: 76
>
> Modified:
> Working Versions/?\208?\162?\208?\181?\209?\133?\208?\189?\208?\190?\208?\187?\208?\190?\208?\179?\208?\184?\208?\184/ITP/New Text
> Document.txt
> ------------------------------------------------------------------------
>
> I use commit-email.pl script.
>
> How can I fix this ?
I don't know why svn log spits out this homegrown, half-baked Unicode
encoding, it isn't documented anywhere, but I can tell you what it
does: Non-ascii code points are encoded as the byte values of the
utf-8 encoding in the form of three decimal digits, prefixed with \?.
So if you can fix commit-email.pl to set the character set of the
email to utf-8, and replace every \?ddd with a single byte with the
value ddd, you should get something readable.
For inspiration, below is a Python filter that converts svn log output
to the user's likely preferred encoding.
- Anders
import fileinput, re, locale
R = re.compile(r'\?\\([0-9][0-9][0-9])')
def decode(txt):
def replacement(match):
return chr(int(match.groups()[0]))
return R.sub(replacement, txt).decode('utf-8')
if __name__=='__main__':
for line in fileinput.input():
print decode(line).encode(locale.getpreferredencoding(),
'xmlcharrefreplace')
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Mon May 21 11:38:10 2007