[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] svn_cmdline__edit_file_externally() may not be able to open the target file in locale other than UTF-8

From: Yasuhito FUTATSUKI <futatuki_at_yf.bsdclub.org>
Date: Wed, 14 Oct 2020 13:30:13 +0900

On 2020/10/08 13:58, Yasuhito FUTATSUKI wrote:
> Hi,
>
> On 2020/10/08 13:12, Jun Omae wrote:
>> Hi all,
>>
>> On Sun, Oct 4, 2020 at 11:33 PM Yasuhito FUTATSUKI <futatuki_at_yf.bsdclub.org> wrote:
>>> I commited the v3 patch in 1882234.
>>>
>>> I confirmed that on Windows in CP932, both of with and without
>>> the v3 patch, it cannot pass the file name to edit.
>>>
>>> For Windows I confirmed that it can be fixed by using _wsystem()
>>> with WCHAR string, so I commited it in 1882235.
>>
>> I found a minor issue on Windows after r1882235.
>>
>> When the editor command (via SVN_EDITOR environment and --editor-cmd option) has
>> non-ascii characters, it is unable to launch the editor.
>>
>> I think we should convert the editor command to UTF-16 encoding before passing it
>> to _wsystem(), otherwise r1882235 should be reverted.
>
> I confirmed it is a regression that introduced in r1882235. So I reverted it
> in 1882313.
>
> Then I'll try to resolve it by the way you suggest.

With the patch attached, I could confirm that I can invoke editor command
even if it contains non-ascii character and/or ascii space in path via
SVN_EDITOR environment variable, --editor-cmd option, and config file
setting (saved in active sytem code page). Especially with environment
variable, Unicode path which cannot represent with active code page can
be used. (I confirmed it on Windows, with VS2019 build).

log message:
[[[
Follow up to r1882234,r1882235,r1882313: Fix file name encoding issue
when invoking editor on Windows.

* subversion/libsvn_subr/cmdline.c
  (): include apr_env.h for apr_env_get
  (find_editor_binary):
    - Change the encoding to set to EDITOR, from active code page to
      UTF-8 on Windows.
      - Transcode editor_cmd to UTF-8.
      - Use apr_env_get() instead of getenv().
      - Transcode config setting for SVN_CONFIG_OPTION_EDITOR_CMD.
    - Add pool argument for transcoding and getting environment variables.
  (svn_cmdline_edit_file_externally, svn_cmdline_edit_string_externally):
    - Use _wsystem() instead of system() on Windows environment.
    - Enclose whole command strings with double quotes[1] in _wsystem().

[1] https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/cmd

Suggested by jun66j5 (editor return value in find_editor_binary)
]]]

However, I still have some considerations:

* Compatibility on use of _wsystem(). I used it for all Win32 environment,
  but isn't it compatibility issue here ?

* Consistency. It seems we use active system code page as encoding of
  native C string, but I use UTF-8 not to break file path.

* There is no way to pass those paths which cannot be represented in
  active codepage, from command line. It is because we use main()
  for program entry point, and its argv is transcoded to active code page
  by C run time library before main() is called. If we use _wmain(),
  we can obtain native UTF-16 argv, or if we use APR's libaprapp,
  we can use argv, transcoded from UTF-16 to UTF-8.

* Embeding native C string in error message. If system() or _wsystem()
  causes error, we make error message from "cmd" argument passed to
  system without transcode, and then we'll see the message that
  incorrectly transcoded as UTF-8 to native C encoding.

Thanks,

-- 
Yasuhito FUTATSUKI <futatuki_at_yf.bsclub.org>

Received on 2020-10-14 06:31:24 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.