I've got an SVN repository containing files whose names contain
non-ascii characters. For purposes of discussion, we'll consider just
one such character:
"ü" - UNICODE (0x00FC = 252 = latin small letter u with diaresis)
When encoded as UTF-8 this produces two bytes: hex: 0xC3, 0xBC;
decimal: 195, 188.
When I perform svnlook -r 76 changed /production/subversion/hooktest
from the command line, the names of the files are returned correctly
as UTF-8 and displayed correctly in my terminal (which is set to
UTF-8).
In an interactive python session /usr/bin/python ('2.3.4 (#1, Mar 20
2006, 00:23:47) \n[GCC 3.4.5 20051201 (Red Hat 3.4.5-2)]') also
behaves has expected. The leading 0x00FC in the first two file names
is returned as UTF from the svnlook subprocess and echoed legibly to
the terminal.
>>> print os.popen("svnlook -r 76 changed
/production/subversion/hooktest").read()
_U übermittlung.txt
_U übermittlung2
_U FOO
_U scratch/
When, however, I run the *same* code in a hook script in the *same*
version of python, my previously nice UTF-8 bytes come back as what
I'll call "shellmanged" (a chunk out of the pre-commit script's log
file):
:cmd:
svnlook -t 76-1 changed /production/subversion/hooktest
:out:
_U ?\195?\188bermittlung.txt
_U ?\195?\188bermittlung2
_U FOO
_U scratch/
Frist, this is not some strange notation of my editor for characters
it can not display. The two bytes of the UTF-8 character are really
gettting expanded to ten bytes \ ? 1 9 5 ? \ 1 8 8. Secondly, what I'm
seeing are the printed decimal versions of the UTF-8 bytes I want.
**Questions
why is there this difference?
has anyone encountered something comparable when writing their hook scripts?
**
Now, I've written something (unshellmangle), which undoes the damage
and gives me back a nice UTF-8 string. Unfortunately, I need to use
these very file names in another call to svnlook (to check for the
presence of a property on *.txt files.)
Now, I know that I can make this work from the interactive shell:
print os.popen("svnlook -r 76 proplist /production/subversion/hooktest
übermittlung.txt").read()
foo
svn:eol-style
But, when I try to do so from the hook script...
(1) using a UTF-8 string (my source file's encoding is set to UTF-8)
cmd = r"svnlook -t %s proplist %s übermittlung.txt" %(TRANSACTION,
REPOSITORY)
log("cmd", cmd)
log("out", os.popen(cmd).read())
log("sys.getdefaultencoding()", sys.getdefaultencoding())
svn: Commit failed (details follow):
svn: 'pre-commit' hook failed with error output:
svn: Can't convert string from native encoding to 'UTF-8':
svn: ?\195?\188bermittlung.txt
(2) trying to immitate the ?\ notation I'm getting back (but that
obviously doesn't work).
cmd = r"svnlook -t %s proplist %s ?\195?\188bermittlung.txt"
%(TRANSACTION, REPOSITORY)
log("cmd", cmd)
log("out", os.popen(cmd).read())
log("sys.getdefaultencoding()", sys.getdefaultencoding())
svn: Commit failed (details follow):
svn: 'pre-commit' hook failed with error output:
svnlook: Path '?195?188bermittlung.txt' does not exist
**Question
How can I pass the name of a file containing a non-ascii character to
svnlook from a running hook script?
**
Thankful for any help
// Ben
Received on Mon May 22 12:03:44 2006