[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Problem with non-English characters in file names

From: Anders J. Munch <ajm_at_flonidan.dk>
Date: 2007-05-21 11:37:47 CEST

Dmitry Y. Labutin wrote:
> Hi, All.
> When I commit files with Russian filename I receive log message by
> e-mail looks like this:
> ------------------------------------------------------------------------
> Author: kornyakov
> Date: 2007-05-12 14:29:15 +0400 (Sat, 12 May 2007)
> New Revision: 76
> Modified:
> Working Versions/?\208?\162?\208?\181?\209?\133?\208?\189?\208?\190?\208?\187?\208?\190?\208?\179?\208?\184?\208?\184/ITP/New Text
> Document.txt
> ------------------------------------------------------------------------
> I use commit-email.pl script.
> How can I fix this ?

I don't know why svn log spits out this homegrown, half-baked Unicode
encoding, it isn't documented anywhere, but I can tell you what it
does: Non-ascii code points are encoded as the byte values of the
utf-8 encoding in the form of three decimal digits, prefixed with \?.

So if you can fix commit-email.pl to set the character set of the
email to utf-8, and replace every \?ddd with a single byte with the
value ddd, you should get something readable.

For inspiration, below is a Python filter that converts svn log output
to the user's likely preferred encoding.

- Anders

import fileinput, re, locale
R = re.compile(r'\?\\([0-9][0-9][0-9])')
def decode(txt):
    def replacement(match):
        return chr(int(match.groups()[0]))
    return R.sub(replacement, txt).decode('utf-8')
if __name__=='__main__':
    for line in fileinput.input():
        print decode(line).encode(locale.getpreferredencoding(),

To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Mon May 21 11:38:10 2007

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.