[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Re: "Save to Clipboard" including BOM marker

From: Gavin Lambert <colnet_at_mirality.co.nz>
Date: Wed, 15 Jul 2015 18:35:38 -0700 (PDT)

On 16/07/2015 06:58, Stefan Küng wrote:
> When you copy the patch to the clipboard, it is copied *exactly* the
> same as if it was written to a file. You can test that yourself with a
> clipboard spy utility if you like. The BOM is written as three bytes
> 0xEF 0xBB 0xBF.

This seems incorrect. The clipboard should only contain textual content; it should not include an initial BOM in any case. (*Files* contain an initial BOM because there is otherwise no reliable way to determine if the content is ANSI or Unicode. The clipboard does not have that issue.)

> The real problem is that the BOM in a patch file does not appear at the
> start of the content but in the middle where the diffed line is shown,
> and most editors can not handle that properly. Because for text files,
> the BOM only appears either at the beginning, or if in the middle should
> be treated as the "Word Joiner" character.

This shouldn't happen either. The BOM should be stripped from the file content prior to generating the diff.

> I can try to set both clipboard formats CF_TEXT *and* CF_UNICODETEXT
> myself when copying the patch file to the clipboard - that will make the
> CF_UNICODETEXT have the proper BOM bytes. But I'm not sure if that would
> fix the issue you're seeing.

You probably should set both of these, but you shouldn't be including the BOM in either.

The way that Windows expects you to deal with UTF-8 files is to load them into memory as UTF-16 (stripping the BOM in the process). If written to the clipboard, CF_UNICODETEXT should get the UTF-16 representation and CF_TEXT should get the ANSI (note: not UTF-8) representation in the active codepage at the time of the copy (not the time that the original file was written). If written to a file, then it should be converted back to UTF-8-with-BOM.


To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_tortoisesvn.tigris.org].
Received on 2015-07-16 03:35:40 CEST

This is an archived mail posted to the TortoiseSVN Users mailing list.