[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

[PATCH] svn_cmdline__edit_file_externally() may not be able to open the target file in locale other than UTF-8

From: Yasuhito FUTATSUKI <futatuki_at_yf.bsdclub.org>
Date: Sun, 20 Sep 2020 23:06:21 +0900

On 2020/09/19 5:26, Yasuhito FUTATSUKI wrote:
> Hi,
>
> While I tried to make an example that escape_path() does not work as
> expected in specific locale such as ja_JP.SJIS or CP932 (suggested by
> Jun), I found it seems svn_cmdline__edit_file_externally() always
> passes the file name as UTF-8 even if the LC_CTYPE is other than
> UTF-8.
>
> I think it is also need svn_path_cstring_from_utf8() conversion for
> file_name in svn_cmdline__edit_file_externally().

As far as I read the code, svn_cmdline__edit_file_externally() is
called with paths come from svn_io_open_unique_file3() or
"local_abspath" from svn_client_conflict_t, so I believe it is
needed.

I wrote a patch address it.
(attached fix-edit-file-externally-patch.txt)
[[[
Fix file name to edit from utf8 to local style.

* subversion/libsvn_subr/cmdline.c (svn_cmdline__edit_file_externally):
  Apply svn_path_cstring_from_utf8() to target file name.
]]]

> Here is a reproucing script. (Using ja_JP.UTF8 and ja_JP.SJIS locale).

As it is quite few system can use ja_JP.SJIS locale (I only know
FreeBSD and macOS) and there is also escape_path issue in SJIS locale,
I modified this script to use ja_JP.eucJP locale.

[[[
#!/bin/sh
# assuming UTF-8 encoding in this file

testdir=/tmp/svn-conflict-edit-filename-test

if [ ! -d ${testdir} ]; then
  mkdir -p ${testdir}
fi

reposdir=${testdir}/testrepo
reposurl=file://${reposdir}
:${svn:=svn}

svnadmin create ${reposdir}
cat > ${testdir}/record_filename.sh <<EOF
#!/bin/sh
LC_CTYPE=C; export LC_CTYPE
echo \$* > ${testdir}/svn-conflict-edit-file-name.txt
exit 0
EOF

LC_CTYPE=ja_JP.UTF-8 ; export LC_CTYPE

# add a file "予定表.txt" (it means schedule in Japanese)
# in UTF-8 working copy.
# "予定表.txt" represented in hex are followings:
# e4 ba 88 e5 ae 9a e8 a1 a8 2e 74 78 74 (UTF-8)
# 97 5c 92 e8 95 5c 2e 74 78 74 (SJIS; contains two '\'(== 0x5c))
# cd bd c4 ea 9c bd 2e 74 78 74 (EUC-JP)
schedfn_utf8="予定表.txt"
schedfn_sjis=`echo ${schedfn_utf8} | iconv -f utf-8 -t sjis`
schedfn_eucjp=`echo ${schedfn_utf8} | iconv -f utf-8 -t euc-jp`

${svn} checkout ${reposurl} ${testdir}/wc-utf-8
cd ${testdir}/wc-utf-8

cat > ${schedfn_utf8} <<EOF
2020/09/19 foo
EOF

${svn} add ${schedfn_utf8}
${svn} commit -m 'add schedule memo.'

# prepare EUC-JP locale wc.
(LC_CTYPE=ja_JP.eucJP; export LC_CTYPE ; \
    ${svn} checkout $reposurl ${testdir}/wc-eucjp)

# update the file in UTF-8 wc and commit it
cat >> ${schedfn_utf8} <<EOF
2020/09/20 bar
EOF
${svn} commit -m 'add schedule at 2020/09/20'

# add local modification in EUC-JP wc
LC_CTYPE=ja_JP.eucJP ; export LC_CTYPE
cd ${testdir}/wc-eucjp
cat >> ${schedfn_eucjp} <<EOF
2020/09/21 baz
EOF

${svn} update --force-interactive --accept edit \
  --editor-cmd "/bin/sh ${testdir}/record_filename.sh"

LC_CTYPE=C ; export LC_CTYPE
ls | od -t x1
od -t x1 ${testdir}/svn-conflict-edit-file-name.txt
]]]

The result with unpatched svn (last 4 lines):
[[[
0000000 cd bd c4 ea c9 bd 2e 74 78 74 0a
0000013
0000000 e4 ba 88 e5 ae 9a e8 a1 a8 2e 74 78 74 0a
0000016
]]]

The result with patched svn (last 4 lines):
[[[
0000000 cd bd c4 ea c9 bd 2e 74 78 74 0a
0000013
0000000 cd bd c4 ea c9 bd 2e 74 78 74 0a
0000013
]]]

Cheers,

-- 
Yasuhito FUTATSUKI <futatuki_at_yf.bsclub.org>

Received on 2020-09-20 16:07:58 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.