[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

mangled ?\nnn encoding; os.popen; only in hook script

From: B. Smith-Mannschott <benpsm_at_gmail.com>
Date: 2006-05-22 12:02:22 CEST

I've got an SVN repository containing files whose names contain
non-ascii characters. For purposes of discussion, we'll consider just
one such character:

"ü" - UNICODE (0x00FC = 252 = latin small letter u with diaresis)
When encoded as UTF-8 this produces two bytes: hex: 0xC3, 0xBC;
decimal: 195, 188.

When I perform svnlook -r 76 changed /production/subversion/hooktest
from the command line, the names of the files are returned correctly
as UTF-8 and displayed correctly in my terminal (which is set to
UTF-8).

In an interactive python session /usr/bin/python ('2.3.4 (#1, Mar 20
2006, 00:23:47) \n[GCC 3.4.5 20051201 (Red Hat 3.4.5-2)]') also
behaves has expected. The leading 0x00FC in the first two file names
is returned as UTF from the svnlook subprocess and echoed legibly to
the terminal.

>>> print os.popen("svnlook -r 76 changed
/production/subversion/hooktest").read()
_U übermittlung.txt
_U übermittlung2
_U FOO
_U scratch/

When, however, I run the *same* code in a hook script in the *same*
version of python, my previously nice UTF-8 bytes come back as what
I'll call "shellmanged" (a chunk out of the pre-commit script's log
file):

:cmd:
svnlook -t 76-1 changed /production/subversion/hooktest
:out:
_U ?\195?\188bermittlung.txt
_U ?\195?\188bermittlung2
_U FOO
_U scratch/

Frist, this is not some strange notation of my editor for characters
it can not display. The two bytes of the UTF-8 character are really
gettting expanded to ten bytes \ ? 1 9 5 ? \ 1 8 8. Secondly, what I'm
seeing are the printed decimal versions of the UTF-8 bytes I want.

**Questions
why is there this difference?
has anyone encountered something comparable when writing their hook scripts?
**

Now, I've written something (unshellmangle), which undoes the damage
and gives me back a nice UTF-8 string. Unfortunately, I need to use
these very file names in another call to svnlook (to check for the
presence of a property on *.txt files.)

Now, I know that I can make this work from the interactive shell:

print os.popen("svnlook -r 76 proplist /production/subversion/hooktest
übermittlung.txt").read()
  foo
  svn:eol-style

But, when I try to do so from the hook script...

(1) using a UTF-8 string (my source file's encoding is set to UTF-8)

    cmd = r"svnlook -t %s proplist %s übermittlung.txt" %(TRANSACTION,
REPOSITORY)
    log("cmd", cmd)
    log("out", os.popen(cmd).read())
    log("sys.getdefaultencoding()", sys.getdefaultencoding())

svn: Commit failed (details follow):
svn: 'pre-commit' hook failed with error output:
svn: Can't convert string from native encoding to 'UTF-8':
svn: ?\195?\188bermittlung.txt

(2) trying to immitate the ?\ notation I'm getting back (but that
obviously doesn't work).

    cmd = r"svnlook -t %s proplist %s ?\195?\188bermittlung.txt"
%(TRANSACTION, REPOSITORY)
    log("cmd", cmd)
    log("out", os.popen(cmd).read())
    log("sys.getdefaultencoding()", sys.getdefaultencoding())

svn: Commit failed (details follow):
svn: 'pre-commit' hook failed with error output:
svnlook: Path '?195?188bermittlung.txt' does not exist

**Question
How can I pass the name of a file containing a non-ascii character to
svnlook from a running hook script?
**

Thankful for any help
// Ben
Received on Mon May 22 12:03:44 2006

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.